CN116206139A - Unmanned aerial vehicle image upscaling matching method based on local self-convolution - Google Patents

Unmanned aerial vehicle image upscaling matching method based on local self-convolution Download PDF

Info

Publication number
CN116206139A
CN116206139A CN202211717727.2A CN202211717727A CN116206139A CN 116206139 A CN116206139 A CN 116206139A CN 202211717727 A CN202211717727 A CN 202211717727A CN 116206139 A CN116206139 A CN 116206139A
Authority
CN
China
Prior art keywords
image
convolution
point
self
gaussian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211717727.2A
Other languages
Chinese (zh)
Inventor
谷鑫斌
张瑛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202211717727.2A priority Critical patent/CN116206139A/en
Publication of CN116206139A publication Critical patent/CN116206139A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a local self-convolution image upscaling matching method suitable for unmanned aerial vehicle visual navigation, which comprises the following steps: s1, performing self-convolution on each pixel point in a training image to realize scale-up operation of the training image; s2, extracting features of the self-rolled training image by using a SIFT method to obtain a descriptor; s3, matching the training image with the query image based on the extracted SIFT descriptor; s4, eliminating error matching generated in the matching process. The image upscaling matching method provided by the invention can effectively remove unnecessary details of the image, and simultaneously retain or highlight important edge information.

Description

Unmanned aerial vehicle image upscaling matching method based on local self-convolution
Technical Field
The invention relates to image matching, in particular to an unmanned aerial vehicle image upscaling matching method based on local self-convolution.
Background
The positioning system such as a GPS (Global positioning System) is free from the problem of visual navigation in the supporting environment, and the unmanned aerial vehicle visual image and a reference image (a remote sensing satellite image with geographic position information) are required to be subjected to feature matching so as to determine the specific azimuth of the unmanned aerial vehicle, wherein the key problem to be solved is feature matching among images with different ground resolutions.
1. When the upscaling process is not performed, because the ground resolutions of the image to be matched and the reference image are large, the problem that the local feature points cannot be accurately matched with the global feature points exists, and too many mismatching feature points or mismatching feature points exist, as shown in fig. 6.
2. The downscaling process based on downsampling or gaussian filtering has feature points that are not matched, as shown in fig. 7.
3. The feature points that can be matched can be more effectively preserved based on the self-convolution upscaling process, as shown in fig. 8.
The query image in the image feature point matching is generally an image captured by a satellite. Training images are images obtained by shooting of the unmanned aerial vehicle.
Image feature point matching mainly involves two tasks: feature point detection and descriptor extraction. Feature point detection typically involves corner point detectors such as Harris and FAST corner point detectors. The goal of feature point detection is to find salient points that are easily detected in two completely independent detection iterations on different images of the same region, which may vary greatly in illumination, scale, rotation, and viewpoint. The descriptor extraction is a step of extracting feature vectors from an area around the feature points. The goal is to construct a descriptor for matching feature points using metrics such as euclidean distance or hamming distance.
The reference image is typically a large-scale image of a satellite camera pair, which when detected using methods such as SIFT, results in increased matching time and affects the accuracy of the matching because the training image contains many unnecessary details compared to the query image. Therefore, the training image needs to be upscaled to remove unnecessary details. Current image upscaling methods mainly use gaussian kernels to filter the image, but one problem with this is that the entire information of the image is blurred, which may result in some important image parts not being well preserved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides an unmanned aerial vehicle image upscaling matching method based on local self-convolution, which can effectively remove unnecessary details of an image and simultaneously reserve or highlight important edge information.
The aim of the invention is realized by the following technical scheme: a unmanned aerial vehicle image upscaling matching method based on local self-convolution comprises the following steps:
s1, performing self-convolution on each pixel point in a training image to realize scale-up operation of the training image;
in the step S1, when performing self-convolution on each pixel point in the training image, a matrix with a size of (2a+1) × (2b+1) is used as a convolution kernel of the pixel point with the pixel point as a center;
the pixel values of different pixel points contained in a matrix of the size (2a+1) × (2b+1) are used as matrix elements.
The step S1 includes:
s101 for any point (x 0 ,y 0 ) The resulting self-convolution image g (x, y) is in (x 0 ,y 0 ) The value g (x 0 ,y 0 ):
Figure BDA0004027945950000021
Wherein the method comprises the steps of
Figure BDA0004027945950000022
Figure BDA0004027945950000023
Representing the original image by (x) 0 ,y 0 ) For the center, a matrix of size (2a+1) × (2b+1), +.>
Figure BDA0004027945950000024
Is (x) 0 ,y 0 ) A convolution kernel of size (2a+1) × (2b+1), f (x 0 -dx,y 0 Dy represents the pixel point (x) 0 -dx,y 0 -dy) pixel values at;
s102, repeatedly executing the step S101 for each point of the training image to obtain the value of the self-convolution image g (x, y) at each point, thereby obtaining the self-convolution image g (x, y) as an upscaled image;
for a pixel point in the image boundary, the convolution kernel with the size of (2a+1) x (2b+1) cannot be obtained, and the pixel value of the pixel point is directly taken as a self-convolution result.
S2, extracting features of the self-rolled training image by using a SIFT method to obtain a descriptor;
said step S2 comprises the sub-steps of:
s201, constructing a DoG pyramid of an upscaled image g (x, y):
a1, constructing a Gaussian scale space as a Gaussian blur result:
the gaussian scale space of an image is defined as a function L (x, y, σ) that results from the convolution of the gaussian kernel function G (x, y, σ) with the input image I (x, y):
L(x,y,σ)=G(x,y,σ)*I(x,y)
where x represents the convolution of the data,
Figure BDA0004027945950000025
sigma is called a scale space factor, is the standard deviation of Gaussian normal distribution, reflects the blurred degree of an image, and the larger the value of the standard deviation is, the more blurred the image is, and the larger the corresponding scale is;
a2, firstly, carrying out Gaussian blur on the obtained upscaled image g (x, y) to obtain a g (x, y) Gaussian blur image serving as a first layer of a Gaussian pyramid, and then carrying out continuous downsampling based on the g (x, y) Gaussian blur image to obtain a series of continuously reduced images; taking the image obtained by each downsampling as a layer, and forming an image pyramid of g (x, y) according to the downsampling sequence;
a3, carrying out Gaussian blur on any layer of image of the image pyramid by using n sequentially arranged scale space factors to obtain n Gaussian blurred images with different scale space factors; for any two adjacent two scale space factors, the ratio of the former space scale factor to the latter space scale factor is k;
for Gaussian blur images with n different scale space factors, calculating a DoG space:
for Gaussian blur images corresponding to two adjacent scale space factors, the calculation mode of the DoG (Difference of Gaussian, difference of Gaussian functions) is as follows:
D(x,y,σ)=[G(x,y,kσ)-G(x,y,σ)]*I(x,y)=L(x,y,kσ)-L(x,y,σ)
wherein L (x, y, σ) is the gaussian scale space of the image;
since n Gaussian blur images with different scale space factors are shared, n-1 DoG spaces are obtained in total through calculation;
a4, repeatedly executing the step A3 for each layer of image of the image pyramid to obtain a DoG pyramid of g (x, y);
s202, detecting extremum in the DoG pyramid, and eliminating extremum points which do not meet the conditions, so as to obtain characteristic points:
b1, for each layer of image of the image pyramid, carrying out the following processing on n-1 DoG spaces obtained by the image pyramid:
in order to find the extreme point of the scale space, for each DoG space, each pixel point is compared with all adjacent points of the same scale space and the adjacent scale space, and when the pixel value is larger than the pixel value of all adjacent points or smaller than the pixel value of all adjacent points, the current pixel point is the extreme point;
and B2, taking the obtained extreme points as candidate feature points, and eliminating the extreme points which do not meet the condition:
for any candidate feature point x, the offset is defined as Deltax, the contrast is the absolute value of D (x) |D (x), and Taylor expansion is applied to D (x)
Figure BDA0004027945950000031
Since x is the extreme point of D (x), the above formula is derived and set to 0 to obtain
Figure BDA0004027945950000032
Then substituting the obtained Deltax into Taylor expansion of D (x)
Figure BDA0004027945950000033
Let the threshold of contrast be T, if
Figure BDA0004027945950000034
The feature point is reserved, otherwise, the feature point is removed;
s203, calculating the main direction of the feature points:
scale image of feature point
L(x,y)=G(x,y,σ)*I(x,y)
Calculating the amplitude and the amplitude of the region image centered on the feature point and having a radius of 3×1.5σ, the amplitude m (x, y) and the direction θ (x, y) of the gradient at each point L (x, y) being obtained by the following formula
Figure BDA0004027945950000035
Figure BDA0004027945950000036
After the gradient direction is calculated, the gradient direction and the amplitude corresponding to the pixels in the neighborhood of the histogram statistical feature point are used; the horizontal axis of the histogram in the gradient direction is the angle of the gradient direction, the vertical axis is the accumulation of the gradient amplitude corresponding to the gradient direction, and the peak value in the histogram is the main direction of the feature point;
s204, generating a feature descriptor:
for each feature point, in order to ensure the rotation invariance of the vector, the coordinate axis is rotated by an angle theta in the neighborhood coordinates with the feature point as the center, wherein theta is the main direction angle of the feature point;
taking a 16×16 window with the main direction as the center after rotation, solving the gradient amplitude and gradient direction of each pixel in the window, then using a gaussian function G (x, y, sigma) with sigma=4 to allocate a weight to the amplitude of each sampling point,
wherein:
Figure BDA0004027945950000041
finally, calculating a weighted accumulated value of the amplitude of each direction in 8 directions on each 4×4 small block to form a seed point; that is, each keypoint is described using 16 seed points, such that one keypoint produces a 128-dimensional SIFT feature vector;
and finally, carrying out normalization processing on the length of the obtained feature vector, further removing the influence of illumination, and obtaining SIFT features, thus obtaining the descriptor.
S3, matching the training image with the query image based on the extracted SIFT descriptor;
firstly substituting the query image into the step S1 to obtain a self-convolution result of the query image, substituting the self-convolution image of the query image into the step S2 to serve as a training image to obtain feature points and descriptors of the query image, calculating Euclidean distances of descriptors corresponding to all feature points in the query image for each descriptor corresponding to each feature point in the training image, and if the Euclidean distance between one descriptor on the training image and a certain descriptor on the query image is smaller than a given threshold value, considering that the feature points corresponding to the two descriptors are successfully matched.
S4, eliminating error matching generated in the matching process.
When the error matching is eliminated in the step S4, any one of the following methods is adopted:
1. cross-filtering:
if the Euclidean distance between one descriptor on the training image and a certain descriptor on the query image is smaller than a given threshold value, the feature points corresponding to the two descriptors are considered to be successfully matched, when one feature point on the training image is matched with one feature point on the query image, an opposite check is carried out, namely the feature point on the query image is matched with the feature point on the training image, and if the matching is successful, the feature point is considered to be correct; if no match can be successfully made, the match is considered a false match, which is removed.
2. Ratio test:
for each match, two nearest neighbor descriptors, namely, two descriptors with minimum Euclidean distance between the descriptors corresponding to the match on the query image on the training image, are returned, and the correct match is considered only when the Euclidean distance between the descriptors corresponding to the first match and the descriptors corresponding to the second match on the query image is smaller than a set threshold value, and if the Euclidean distance is larger than the set threshold value, the incorrect match is considered, and the incorrect match is removed.
The beneficial effects of the invention are as follows: according to the method, the images shot by the unmanned aerial vehicle in real time are matched with the images shot by the satellite in the prior art, unnecessary details of the images can be effectively removed through the self-rolling method, and important edge information is reserved or highlighted.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is an original image in an embodiment;
FIG. 3 is a self-convolution image obtained using a convolution kernel of 5x5 size in an embodiment;
FIG. 4 is a diagram of an embodiment of an edge extraction of an original image using the Prewitt algorithm;
FIG. 5 is an image obtained by extracting edges from a convolution image using the Prewitt algorithm in an embodiment.
FIG. 6 is a graph showing the results of matching an unprocessed training image with a query image according to an embodiment
FIG. 7 is a diagram showing the result of matching a training image with a query image using Gaussian blur processing in an embodiment
FIG. 8 is a diagram of a result of matching a training image with a query image using a self-convolution method in an embodiment
FIG. 9 is an image of a third layer of a Gaussian pyramid of a training image in an embodiment
FIG. 10 is an image of a third layer of a DoG pyramid of a training image in an embodiment
FIG. 11 is a descriptor (only the upper left corner is shown) of a training image in an embodiment
Detailed Description
The technical solution of the present invention will be described in further detail with reference to the accompanying drawings, but the scope of the present invention is not limited to the following description.
As shown in fig. 1, a method for up-scaling matching an unmanned aerial vehicle image based on local self-convolution includes the following steps: s1, performing upscaling operation on a training image by adopting a local self-convolution method; s2, extracting features of the self-rolled training image by using a SIFT method; s3, matching the training image with the query image; s4, eliminating error matching generated in the matching process.
In the step S1, when performing self-convolution on each pixel point in the training image, a matrix with a size of (2a+1) × (2b+1) is used as a convolution kernel of the pixel point with the pixel point as a center;
the pixel values of different pixel points contained in a matrix of the size (2a+1) × (2b+1) are used as matrix elements.
The step S1 includes:
s101 for any point (x 0 ,y 0 ) The resulting self-convolution image g (x, y) is in (x 0 ,y 0 ) The value g (x 0 ,y 0 ):
Figure BDA0004027945950000051
Wherein the method comprises the steps of
Figure BDA0004027945950000061
Figure BDA0004027945950000062
Representing the original image by (x) 0 ,y 0 ) For the center, a matrix of size (2a+1) × (2b+1), +.>
Figure BDA0004027945950000063
Is (x) 0 ,y 0 ) A convolution kernel of size (2a+1) × (2b+1), f (x 0 -dx,y 0 Dy represents the pixel point (x) 0 -dx,y 0 -dy) pixel values at;
s102, repeatedly executing the step S101 for each point of the training image to obtain the value of the self-convolution image g (x, y) at each point, thereby obtaining the self-convolution image g (x, y) as an upscaled image;
for a pixel point in the image boundary, the convolution kernel with the size of (2a+1) x (2b+1) cannot be obtained, and the pixel value of the pixel point is directly taken as a self-convolution result.
For one (5, 5) image, it is self-rolled with (3, 3) kernels
Figure BDA0004027945950000064
The size of the core is (3, 3),
therefore, only the values of (2, 2), (2, 3), (2, 4), (3, 2), (3, 3), (3, 4), (4, 2), (4, 3), (4, 4) and the positions will change, and the result after the change is
Figure BDA0004027945950000065
The self-convolution result of the image is obtained.
When the Prewitt operator is used to calculate edges for it (only the x-direction is shown here)
Figure BDA0004027945950000066
/>
Here we focus on the (3, 3) position, the value of the (3, 3) position being
x24(x24+x35)-x22(x22-x33)+x34(x34+x23)-x32(x32-x43)+x44(x44+x33)-x42(x42-x31)+(x14*x25+x34*x45+x43*x54)-(x41*x52+x12*x23+x21*x32)
The value extracted by the Prewitt operator of the (3, 3) position after the image self-convolution is obtained, compared with the value extracted by the Prewitt of the non-self-convolution image
x24-x22+x34-x32+x44-x42
The weighting takes into account the pixels farther around to give different weights and finally adds two terms of symmetry to correct the value.
For fig. 2, when a convolution kernel of 5x5 size is used, the resulting self-convolution image is fig. 3.
It was found that the smaller width lines were eliminated from the roll-up. The following fig. 4 and 5 can be obtained by extracting edges from fig. 2 and 3 using the Prewitt algorithm, respectively.
It is obvious that the self-convolution of the image is effective for removing unnecessary details and the important edge information of the image is well preserved and even enhanced.
In the step S2, SIFT is used to extract features of the image:
the SIFT algorithm was proposed by Lowe in 2004. It remains unchanged from rotation, scaling, brightness variation, etc., and is a very stable local feature.
To extract SIFT features, the method mainly comprises the following steps:
an image pyramid is first constructed and,
images of different blur levels can be obtained using different "gaussian kernels". A gaussian scale space of an image can be derived from its convolution with different gaussian:
L(x,y,σ)=G(x,y,σ)*I(x,y)
where G (x, y, σ) is a Gaussian kernel function.
Figure BDA0004027945950000071
Sigma is called a scale space factor, which is the standard deviation of a gaussian normal distribution, reflecting the degree to which an image is blurred, the larger its value, the more blurred the image, and the larger the corresponding scale.
The scale space is constructed to detect feature points that are all present at different scales,
let k be the scale factor of two adjacent gaussian scale spaces, the definition of DoG:
D(x,y,σ)=[G(x,y,kσ)-G(x,y,σ)]*I(x,y)=L(x,y,kσ)-L(x,y,σ)
wherein L (x, y, σ) is the gaussian scale space of the image;
the image pyramid is a group of results obtained by the same image under different resolutions, the original image is firstly required to be subjected to Gaussian blur, and then the Gaussian blurred image is subjected to downsampling to obtain a series of images with continuously reduced sizes;
carrying out Gaussian blur on each layer of image of the image pyramid by using different parameters sigma to obtain the Gaussian pyramid;
the resulting gaussian pyramid third layer is shown in fig. 9. In downsampling, the first image of the group of images above the pyramid is downsampled from the third image of the group below. After the construction of the Gaussian pyramid is completed, the adjacent Gaussian pyramids are subtracted to obtain the DoG pyramid. The resulting DoG pyramid is shown in fig. 10.
The purpose of constructing the scale space is to detect feature points that exist at different scales, and the better operator for detecting feature points is delta 2 G (laplace, loG),
Figure BDA0004027945950000072
direct use of LoG computation is relatively expensive, and DoG is typically used to approximate LoG. Let k be the scale factor of two adjacent gaussian scale spaces, the definition of DoG:
D(x,y,σ)=[G(x,y,kσ)-G(x,y,σ)]*I(x,y)=L(x,y,kσ)-L(x,y,σ)
where L (x, y, σ) is the gaussian scale space of the image.
Then detecting extremum in the DoG space, and eliminating bad extremum points:
candidate feature point x, whose offset is defined as Δx, whose contrast is D (x) absolute value |D (x) |, and Taylor expansion is applied to D (x)
Figure BDA0004027945950000081
Since x is the extreme point of D (x), the above formula is derived and set to 0 to obtain
Figure BDA0004027945950000082
Then substituting the obtained Deltax into Taylor expansion of D (x)
Figure BDA0004027945950000083
Let the threshold of contrast be T, if
Figure BDA0004027945950000084
The feature point remains and is otherwise removed.
Then, the main direction of the characteristic points is calculated:
scale image of feature point
L(x,y)=G(x,y,σ)*I(x,y)
Calculating the amplitude and the angle of the region image centered on the feature point and having a radius of 3×1.5σ, the modulus m (x, y) and the direction θ (x, y) of the gradient of each point L (x, y) can be obtained by the following formula
Figure BDA0004027945950000085
Figure BDA0004027945950000086
After the gradient direction is calculated, the gradient direction and the amplitude corresponding to the pixels in the neighborhood of the histogram statistical feature point are used. The horizontal axis of the histogram of the gradient direction is the angle of the gradient direction (the gradient direction ranges from 0 to 360 degrees, the histogram has 10 columns every 36 degrees, or 8 columns every 45 degrees), the vertical axis is the accumulation of the gradient magnitudes corresponding to the gradient direction, and the peak value of the histogram is the main direction of the feature point
Finally, generating feature descriptions
1. The main direction of rotation is corrected, ensuring rotation invariance.
2. Generating descriptors to finally form a 128-dimensional feature vector
3. And (3) normalizing, namely normalizing the length of the feature vector to further remove the influence of illumination.
And obtaining SIFT features.
To ensure rotational invariance of the vector, the coordinate axis is rotated by θ (the main direction of the feature point) in the neighborhood coordinates around the feature point as the center.
The rotation is followed by a 16 x 16 window centered around the main direction. The gradient amplitude and gradient direction of each pixel in the window are obtained, and then weighted by using a Gaussian window. Finally, calculating the accumulated value of each direction in 8 directions on each 4×4 small block, and forming a seed point. I.e., each keypoint is described using 16 seed points, such that one keypoint can produce a 128-dimensional SIFT feature vector. The resulting descriptor is shown in fig. 11.
The step S4 is to use a ratio test to eliminate mismatch.
1. Cross-filtering:
if the Euclidean distance between one descriptor on the training image and a certain descriptor on the query image is smaller than a certain threshold value, the feature points corresponding to the two descriptors are considered to be successfully matched. When one feature point on the training image is matched with one feature point on the query image, performing one-time opposite check, namely matching the feature point on the query image with the feature point on the training image, and if matching is successful, judging that the feature point is correct; if no match can be successfully made, the match is considered a false match, which is removed.
2. Ratio test:
for each match, two nearest neighbor descriptors, i.e., two descriptors with minimum Euclidean distance on the query image of the descriptor corresponding to the match on the training image, are returned, and only when the Euclidean distance between the descriptor on the query image corresponding to the first match and the descriptor on the query image corresponding to the second match is smaller than a certain threshold value, the correct match is considered. If the Euclidean distance is greater than the set threshold, then the error match is considered to be removed.
While the foregoing description illustrates and describes a preferred embodiment of the present invention, it is to be understood that the invention is not limited to the form disclosed herein, but is not to be construed as limited to other embodiments, but is capable of use in various other combinations, modifications and environments and is capable of changes or modifications within the spirit of the invention described herein, either as a result of the foregoing teachings or as a result of the knowledge or skill of the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.

Claims (6)

1. A unmanned aerial vehicle image upscaling matching method based on local self-convolution is characterized by comprising the following steps of: the method comprises the following steps:
s1, performing self-convolution on each pixel point in a training image to realize scale-up operation of the training image;
s2, extracting features of the self-rolled training image by using a SIFT method to obtain a descriptor;
s3, matching the training image with the query image based on the extracted SIFT descriptor;
s4, eliminating error matching generated in the matching process.
2. The unmanned aerial vehicle image upscaling matching method based on local self-convolution as claimed in claim 1, wherein the method comprises the following steps: in the step S1, when performing self-convolution on each pixel point in the training image, a matrix with a size of (2a+1) × (2b+1) is used as a convolution kernel of the pixel point with the pixel point as a center;
the pixel values of different pixel points contained in a matrix of the size (2a+1) × (2b+1) are used as matrix elements.
3. The unmanned aerial vehicle image upscaling matching method based on local self-convolution as claimed in claim 2, wherein the method comprises the following steps: the step S1 includes:
s101 for any point (x 0 ,y 0 ) The resulting self-convolution image g (x, y) is in (x 0 ,y 0 ) The value g (x 0 ,y 0 ):
Figure FDA0004027945940000011
Wherein the method comprises the steps of
Figure FDA0004027945940000012
Figure FDA0004027945940000013
Representing the original image by (x) 0 ,y 0 ) For the center, a matrix of size (2a+1) × (2b+1), +.>
Figure FDA0004027945940000014
Is (x) 0 ,y 0 ) A convolution kernel of size (2a+1) × (2b+1), f (x 0 -dx,y 0 Dy represents the pixel point (x) 0 -dx,y 0 -dy) pixel values at;
s102, repeatedly executing the step S101 for each point of the training image to obtain the value of the self-convolution image g (x, y) at each point, thereby obtaining the self-convolution image g (x, y) as an upscaled image;
for a pixel point in the image boundary, the convolution kernel with the size of (2a+1) x (2b+1) cannot be obtained, and the pixel value of the pixel point is directly taken as a self-convolution result.
4. The unmanned aerial vehicle image upscaling matching method based on local self-convolution as claimed in claim 1, wherein the method comprises the following steps: said step S2 comprises the sub-steps of:
s201, constructing a DoG pyramid of an upscaled image g (x, y):
a1, constructing a Gaussian scale space as a Gaussian blur result:
the gaussian scale space of an image is defined as a function L (x, y, σ) that results from the convolution of the gaussian kernel function G (x, y, σ) with the input image I (x, y):
L(x,y,σ)=G(x,y,σ)*I(x,y)
where x represents the convolution of the data,
Figure FDA0004027945940000021
sigma is called a scale space factor, is the standard deviation of Gaussian normal distribution, reflects the blurred degree of an image, and the larger the value of the standard deviation is, the more blurred the image is, and the larger the corresponding scale is;
a2, firstly, carrying out Gaussian blur on the obtained upscaled image g (x, y) to obtain a g (x, y) Gaussian blur image serving as a first layer of a Gaussian pyramid, and then carrying out continuous downsampling based on the g (x, y) Gaussian blur image to obtain a series of continuously reduced images; taking the image obtained by each downsampling as a layer, and forming an image pyramid of g (x, y) according to the downsampling sequence;
a3, carrying out Gaussian blur on any layer of image of the image pyramid by using n sequentially arranged scale space factors to obtain n Gaussian blurred images with different scale space factors; for any two adjacent two scale space factors, the ratio of the former space scale factor to the latter space scale factor is k;
for Gaussian blur images with n different scale space factors, calculating a DoG space:
aiming at Gaussian blur images corresponding to two adjacent scale space factors, the DoG calculation mode is as follows:
D(x,y,σ)=[G(x,y,kσ)-G(x,y,σ)]*I(x,y)=L(x,y,kσ)-L(x,y,σ)
wherein L (x, y, σ) is the gaussian scale space of the image;
since n Gaussian blur images with different scale space factors are shared, n-1 DoG spaces are obtained in total through calculation;
a4, repeatedly executing the step A3 for each layer of image of the image pyramid to obtain a DoG pyramid of g (x, y);
s202, detecting extremum in the DoG pyramid, and eliminating extremum points which do not meet the conditions, so as to obtain characteristic points:
for each layer of image of the image pyramid, the obtained n-1 DoG spaces are processed as follows:
in order to find the extreme point of the scale space, for each DoG space, each pixel point is compared with all adjacent points of the same scale space and the adjacent scale space, and when the pixel value is larger than the pixel value of all adjacent points or smaller than the pixel value of all adjacent points, the current pixel point is the extreme point;
then, the obtained extreme points are used as candidate feature points, and extreme points which do not meet the conditions are removed:
for any candidate feature point x, the offset is defined as Deltax, the contrast is the absolute value of D (x) |D (x), and Taylor expansion is applied to D (x)
Figure FDA0004027945940000022
Since x is the extreme point of D (x), the above formula is derived and set to 0 to obtain
Figure FDA0004027945940000023
Then substituting the obtained Deltax into Taylor expansion of D (x)
Figure FDA0004027945940000024
Let the threshold of contrast be T, if
Figure FDA0004027945940000025
The feature point is reserved, otherwise, the feature point is removed;
s203, calculating the main direction of the feature points:
scale image of feature point
L(x,y)=G(x,y,σ)*I(x,y)
Calculating the amplitude and the amplitude of the region image centered on the feature point and having a radius of 3×1.5σ, the amplitude m (x, y) and the direction θ (x, y) of the gradient at each point L (x, y) being obtained by the following formula
Figure FDA0004027945940000031
Figure FDA0004027945940000032
After the gradient direction is calculated, the gradient direction and the amplitude corresponding to the pixels in the neighborhood of the histogram statistical feature point are used; the horizontal axis of the histogram in the gradient direction is the angle of the gradient direction, the vertical axis is the accumulation of the gradient amplitude corresponding to the gradient direction, and the peak value in the histogram is the main direction of the feature point;
s204, generating a feature descriptor:
for each feature point, in order to ensure the rotation invariance of the vector, the coordinate axis is rotated by an angle theta in the neighborhood coordinates with the feature point as the center, wherein theta is the main direction angle of the feature point;
taking a 16×16 window with the main direction as the center after rotation, solving the gradient amplitude and gradient direction of each pixel in the window, then using a gaussian function G (x, y, sigma) with sigma=4 to allocate a weight to the amplitude of each sampling point,
wherein:
Figure FDA0004027945940000033
finally, calculating a weighted accumulated value of the amplitude of each direction in 8 directions on each 4×4 small block to form a seed point; that is, each keypoint is described using 16 seed points, such that one keypoint produces a 128-dimensional SIFT feature vector;
and finally, carrying out normalization processing on the length of the obtained feature vector, further removing the influence of illumination, and obtaining SIFT features, thus obtaining the descriptor.
5. The unmanned aerial vehicle image upscaling matching method based on local self-convolution as claimed in claim 1, wherein the method comprises the following steps: the step S3 includes:
firstly substituting the query image into the step S1 to obtain a self-convolution result of the query image, substituting the self-convolution image of the query image into the step S2 to serve as a training image to obtain feature points and descriptors of the query image, calculating Euclidean distances of descriptors corresponding to all feature points in the query image for each descriptor corresponding to each feature point in the training image, and if the Euclidean distance between one descriptor on the training image and a certain descriptor on the query image is smaller than a given threshold value, considering that the feature points corresponding to the two descriptors are successfully matched.
6. The unmanned aerial vehicle image upscaling matching method based on local self-convolution as claimed in claim 1, wherein the method comprises the following steps: when the error matching is eliminated in the step S4, any one of the following methods is adopted:
1. cross-filtering:
if the Euclidean distance between one descriptor on the training image and a certain descriptor on the query image is smaller than a given threshold value, the feature points corresponding to the two descriptors are considered to be successfully matched, when one feature point on the training image is matched with one feature point on the query image, an opposite check is carried out, namely the feature point on the query image is matched with the feature point on the training image, and if the matching is successful, the feature point is considered to be correct; if no match can be successfully made, the match is considered a false match, which is removed.
2. Ratio test:
for each match, two nearest neighbor descriptors, namely, two descriptors with minimum Euclidean distance between the descriptors corresponding to the match on the query image on the training image, are returned, and the correct match is considered only when the Euclidean distance between the descriptors corresponding to the first match and the descriptors corresponding to the second match on the query image is smaller than a set threshold value, and if the Euclidean distance is larger than the set threshold value, the incorrect match is considered, and the incorrect match is removed.
CN202211717727.2A 2022-12-29 2022-12-29 Unmanned aerial vehicle image upscaling matching method based on local self-convolution Pending CN116206139A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211717727.2A CN116206139A (en) 2022-12-29 2022-12-29 Unmanned aerial vehicle image upscaling matching method based on local self-convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211717727.2A CN116206139A (en) 2022-12-29 2022-12-29 Unmanned aerial vehicle image upscaling matching method based on local self-convolution

Publications (1)

Publication Number Publication Date
CN116206139A true CN116206139A (en) 2023-06-02

Family

ID=86508575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211717727.2A Pending CN116206139A (en) 2022-12-29 2022-12-29 Unmanned aerial vehicle image upscaling matching method based on local self-convolution

Country Status (1)

Country Link
CN (1) CN116206139A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132913A (en) * 2023-10-26 2023-11-28 山东科技大学 Ground surface horizontal displacement calculation method based on unmanned aerial vehicle remote sensing and feature recognition matching

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132913A (en) * 2023-10-26 2023-11-28 山东科技大学 Ground surface horizontal displacement calculation method based on unmanned aerial vehicle remote sensing and feature recognition matching
CN117132913B (en) * 2023-10-26 2024-01-26 山东科技大学 Ground surface horizontal displacement calculation method based on unmanned aerial vehicle remote sensing and feature recognition matching

Similar Documents

Publication Publication Date Title
EP2534612B1 (en) Efficient scale-space extraction and description of interest points
EP2138978B1 (en) System and method for finding stable keypoints in a picture image using localized scale space properties
Bouchiha et al. Automatic remote-sensing image registration using SURF
CN108986152B (en) Foreign matter detection method and device based on difference image
CN103065135A (en) License number matching algorithm based on digital image processing
CN110634137A (en) Bridge deformation monitoring method, device and equipment based on visual perception
CN111242050A (en) Automatic change detection method for remote sensing image in large-scale complex scene
CN112017223A (en) Heterologous image registration method based on improved SIFT-Delaunay
CN114897705A (en) Unmanned aerial vehicle remote sensing image splicing method based on feature optimization
Liu et al. Multi-sensor image registration by combining local self-similarity matching and mutual information
CN114359591A (en) Self-adaptive image matching algorithm with edge features fused
CN112614167A (en) Rock slice image alignment method combining single-polarization and orthogonal-polarization images
CN110516731B (en) Visual odometer feature point detection method and system based on deep learning
CN112907580A (en) Image feature extraction and matching algorithm applied to comprehensive point-line features in weak texture scene
CN116206139A (en) Unmanned aerial vehicle image upscaling matching method based on local self-convolution
CN110929598A (en) Unmanned aerial vehicle-mounted SAR image matching method based on contour features
CN107808165B (en) Infrared image matching method based on SUSAN corner detection
CN115205558B (en) Multi-mode image matching method and device with rotation and scale invariance
CN116091998A (en) Image processing method, device, computer equipment and storage medium
CN114004770B (en) Method and device for accurately correcting satellite space-time diagram and storage medium
CN114255398A (en) Method and device for extracting and matching features of satellite video image
CN115601569A (en) Different-source image optimization matching method and system based on improved PIIFD
CN114972453A (en) Improved SAR image region registration method based on LSD and template matching
Hou et al. Navigation landmark recognition and matching algorithm based on the improved SURF
CN113222028A (en) Image feature point real-time matching method based on multi-scale neighborhood gradient model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination