CN116168221B - Transformer-based cross-mode image matching and positioning method and device - Google Patents

Transformer-based cross-mode image matching and positioning method and device Download PDF

Info

Publication number
CN116168221B
CN116168221B CN202310450328.2A CN202310450328A CN116168221B CN 116168221 B CN116168221 B CN 116168221B CN 202310450328 A CN202310450328 A CN 202310450328A CN 116168221 B CN116168221 B CN 116168221B
Authority
CN
China
Prior art keywords
image
generator
loss function
input
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310450328.2A
Other languages
Chinese (zh)
Other versions
CN116168221A (en
Inventor
杨小冈
李清格
申通
卢瑞涛
朱正杰
张涛
谢学立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rocket Force University of Engineering of PLA
Original Assignee
Rocket Force University of Engineering of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rocket Force University of Engineering of PLA filed Critical Rocket Force University of Engineering of PLA
Priority to CN202310450328.2A priority Critical patent/CN116168221B/en
Publication of CN116168221A publication Critical patent/CN116168221A/en
Application granted granted Critical
Publication of CN116168221B publication Critical patent/CN116168221B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Image Processing (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)

Abstract

The invention relates to a trans-form-based trans-form image matching and positioning method and device, which solve the problems of low trans-form image matching precision and large positioning error, and adopt trans-form image style migration to convert trans-form images into the same characteristic domain, and the trans-form-based intelligent matching algorithm can effectively solve the problem of improving the matching precision, thereby realizing reliable and effective trans-form image matching geographic positioning.

Description

Transformer-based cross-mode image matching and positioning method and device
Technical Field
The application relates to the technical field of autonomous vision positioning of aircrafts, in particular to a trans-former-based cross-mode image matching positioning method and device.
Background
The vision-based autonomous vision positioning technology of the aircraft is rapid in development, is a development requirement of navigation guidance, situation awareness and autonomous decision of the aircraft, and plays an irreplaceable role in typical space-based platform tasks such as target detection, identification and tracking. Because the single-mode sensor has limited acquired information, the infrared image can reflect the thermal radiation information of the object, is not easily influenced by external light, and can effectively image at night or in a smoke environment. Therefore, the real-time infrared image of aerial photography and the visible light image of known geographic information are utilized for matching, and richer information can be obtained, so that the night geographic positioning requirement of the aircraft is realized, the working requirement of the aircraft navigation system in the whole day is met, and the method has important and wide application prospect.
Developing a cross-modal image matching and positioning algorithm of an aircraft is a very basic task with quite high difficulty, the task firstly obtains a real-time image through an onboard camera, secondly matches the real-time image with a reference image with known geographic information position by using an intelligent matching algorithm, determines the position of a characteristic point in the real-time image in the reference image, wherein the real-time image and the reference image are cross-modal images, and finally calculates actual geographic positioning information of the aircraft by using a multi-view geometric algorithm according to the matching corresponding relation of the characteristic point. The biggest difficulty in the cross-modal image matching and positioning technology is to accurately, robustly and effectively match the cross-modal image. When the mode difference among images is large, the visual angle change is large or the characteristics are not obvious, the performance of the matching algorithm is greatly affected. The traditional matching method comprises a matching method based on a region and a matching method based on a feature, wherein the matching method is simple in principle, does not have real-time property and is easy to fall into a local optimal solution, and the matching method is small in calculated amount, but the extracted feature is shallow, the feature points defined by people cannot embody semantic information, and mismatching is easy to cause. The matching method based on deep learning utilizes a deep neural network to extract image features, and is difficult to directly apply to cross-mode image matching with obvious mode difference although higher matching precision is obtained.
Disclosure of Invention
In order to overcome at least one defect in the prior art, the application provides a trans-former-based cross-mode image matching and positioning method and device.
In a first aspect, a trans-former-based cross-mode image matching and positioning method is provided, including:
acquiring a real-time infrared image and a visible light image under the view angle of the unmanned aerial vehicle;
adopting a cross-mode image style migration network structure to perform style migration on the visible light image to obtain a pseudo infrared image;
performing intelligent matching on the real-time infrared image and the pseudo-infrared image by adopting a Transformer intelligent matching method to obtain a characteristic point matching relationship;
determining a homography transformation matrix according to the characteristic point matching relation;
according to the homography transformation matrix, perspective transformation is carried out on the center point of the real-time infrared image, and the pixel point corresponding to the center point in the pseudo infrared image is determined;
mapping pixel points corresponding to the center points in the pseudo infrared image onto the visible light image, and determining mapping points in the visible light image;
and obtaining a geographic positioning result of the unmanned aerial vehicle according to the geographic position information corresponding to the mapping points in the visible light image.
In one embodiment, the cross-modality image style migration network structure is a CycleGAN network structure, and the total loss function is:
wherein,,Gas a result of the fact that the first generator,Ffor the second generator, X is the source domain, Y is the target domain,for the first arbiter, ++>Is a second discriminator; />For the total loss function->For the first generatorGAnd a first discriminator->A contrast loss function between->For the second generator F and the second arbiter +.>A contrast loss function between->For the cyclic consistency loss function +.>As a function of the loss of identity of the entity,weight coefficient for cyclic consistency loss function, < ->The weight coefficient of the ontology consistency loss function.
In one embodiment, the first generatorGAnd a first discriminatorFight loss function betweenThe following formula is used:
wherein,,for the real image of the target domain +.>Is the real image of the source domain, +.>For input of +.>Time first generatorGAn output of (2);Eis the expected value; />For input of +.>First discriminator->Output of->For input of +.>First discriminator->An output of (2);
a second generatorFAnd a second discriminatorBetween the antagonism loss function->The following formula is used:
wherein,,for input of +.>The output of the second generator F; />For input of +.>Second discriminator->Output of->Input is +.>Second discriminator->An output of (2);
cyclic consistency loss functionThe following formula is used:
wherein,,for input of +.>Time second generatorFOutput of->For input of +.>Time first generatorGAn output of (2);
ontology consistency loss functionThe following formula is used:
wherein,,for input of +.>Time first generatorGOutput of->For input of +.>Time second generatorFIs provided.
In one embodiment, a transform intelligent matching method is used to intelligently match a real-time infrared image and a pseudo infrared image to obtain a characteristic point matching relationship, including:
the method comprises the steps of respectively carrying out feature extraction on a real-time infrared image and a pseudo-infrared image by adopting a twin network in a ResNet50 backbone network to obtain two feature images, and splicing the two feature images to obtain a spliced feature image;
adding position codes into the spliced feature images to obtain a context feature image;
inputting the query points and the context feature map into a transducer encoder-decoder structure together to obtain a high-dimensional vector;
and inputting the high-dimensional vector into a multi-layer perceptron to obtain the characteristic point matching relationship.
In a second aspect, a trans-former-based cross-modality image matching and positioning device is provided, including:
the image acquisition module is used for acquiring real-time infrared images and visible light images under the view angle of the unmanned aerial vehicle;
the image style migration module is used for performing style migration on the visible light image by adopting a cross-mode image style migration network structure to obtain a pseudo infrared image;
the intelligent matching module is used for intelligently matching the real-time infrared image and the pseudo-infrared image by adopting a Transformer intelligent matching method to obtain a characteristic point matching relationship;
the homography transformation matrix determining module is used for determining a homography transformation matrix according to the characteristic point matching relation;
the perspective transformation module is used for carrying out perspective transformation on the center point of the real-time infrared image according to the homography transformation matrix, and determining the pixel point corresponding to the center point in the pseudo infrared image;
the mapping module is used for mapping the pixel points corresponding to the center points in the pseudo infrared image to the visible light image and determining the mapping points in the visible light image;
and the positioning result determining module is used for obtaining the geographic positioning result of the unmanned aerial vehicle according to the geographic position information corresponding to the mapping points in the visible light image.
In one embodiment, the cross-modality image style migration network structure is a CycleGAN network structure, and the total loss function is:
wherein,,Gas a result of the fact that the first generator,Ffor the second generator, X is the source domain, Y is the target domain,for the first arbiter, ++>Is a second discriminator; />For the total loss function->For the first generatorGAnd a first discriminator->A contrast loss function between->For the second generator F and the second arbiter +.>A contrast loss function between->For the cyclic consistency loss function +.>As a function of the loss of identity of the entity,weight coefficient for cyclic consistency loss function, < ->The weight coefficient of the ontology consistency loss function.
In one embodiment, the first generatorGAnd a first discriminatorFight loss function betweenThe following formula is used:
wherein,,for the real image of the target domain +.>Is the real image of the source domain, +.>For input of +.>Time first generatorGAn output of (2);Eis the expected value; />For input of +.>First discriminator->Output of->For input of +.>First discriminator->An output of (2);
second generator F and second discriminatorBetween the antagonism loss function->The following formula is used:
wherein,,for input of +.>The output of the second generator F; />For input of +.>Second discriminator->Output of->Input is +.>Second discriminator->An output of (2);
cyclic consistency loss functionThe following formula is used:
wherein,,for input of +.>Time second generatorFOutput of->For input of +.>Time first generatorGAn output of (2);
ontology consistency loss functionThe following formula is used:
wherein,,for input of +.>Time first generatorGOutput of->For input of +.>Time second generatorFIs provided.
In one embodiment, the intelligent matching module is further configured to:
the method comprises the steps of respectively carrying out feature extraction on a real-time infrared image and a pseudo-infrared image by adopting a twin network in a ResNet50 backbone network to obtain two feature images, and splicing the two feature images to obtain a spliced feature image;
adding position codes into the spliced feature images to obtain a context feature image;
inputting the query points and the context feature map into a transducer encoder-decoder structure together to obtain a high-dimensional vector;
and inputting the high-dimensional vector into a multi-layer perceptron to obtain the characteristic point matching relationship.
Compared with the prior art, the application has the following beneficial effects:
1. the invention provides a general framework of a cross-mode image matching and positioning method, which converts cross-mode images with larger feature differences into the same feature domain by using a generated countermeasure network, and solves the problem of mismatching caused by the imaging differences of the cross-mode images.
2. And (3) constructing a network model to carry out style migration on the visible light image, and constructing a body consistency loss besides the countermeasures loss and the cycle consistency loss in the original CycleGAN by the total loss function so as to accelerate the convergence of the network.
3. Matching the style-migrated image with the real-time infrared image by using a transform-based intelligent matching algorithm, so that the matching precision is remarkably improved, and a geographic positioning result is obtained by using perspective transformation; experimental results prove that the positioning method can remarkably improve the matching performance of the cross-mode images, reliable and effective geographic positioning results are realized, and the effectiveness and superiority of the positioning method are verified.
Drawings
The present application may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are incorporated in and form a part of this specification, together with the following detailed description. In the drawings:
FIG. 1 shows a flow diagram of a Transformer-based cross-modality image matching localization method according to an embodiment of the present application;
FIG. 2 illustrates an image modality conversion schematic;
FIG. 3 shows a schematic diagram of a CycleGAN network architecture;
FIG. 4 shows a block diagram of a trans-former based cross-modality image matching localization arrangement according to an embodiment of the present application;
fig. 5 shows a comparison of the positioning method of the present application with the positioning results of the prior art method.
Detailed Description
Exemplary embodiments of the present application will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual embodiment are described in the specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions may be made to achieve the developers' specific goals, and that these decisions may vary from one implementation to another.
It should be noted that, in order to avoid obscuring the present application with unnecessary details, only the device structures closely related to the solution according to the present application are shown in the drawings, and other details not greatly related to the present application are omitted.
It is to be understood that the present application is not limited to the described embodiments due to the following description with reference to the drawings. In this context, embodiments may be combined with each other, features replaced or borrowed between different embodiments, one or more features omitted in one embodiment, where possible.
The method for cross-modal image style migration can convert cross-modal images into the same characteristic domain, and an intelligent matching algorithm based on a Transformer can effectively solve the problem of improving matching precision, so that reliable and effective cross-modal image matching geographic positioning is realized.
An embodiment of the present application provides a flow chart of a trans-former-based cross-mode image matching and positioning method, and fig. 1 shows a flow chart of a trans-former-based cross-mode image matching and positioning method according to an embodiment of the present application, referring to fig. 1, the method includes:
step S1, acquiring real-time infrared images and visible light images under the view angle of the unmanned aerial vehicle.
Here, the intelligent EVO ii unmanned aerial vehicle is utilized to cruise according to a planned route in a specified visual navigation area, the flying height is 350m, the visual angle is forward looking down, a visible light image in daytime is obtained through shooting, and the image comprises houses, roads, plants and the like.
Based on the satellite image, the unmanned aerial vehicle shoots a visible light image, longitude and latitude information corresponding to each pixel point is found in the visible light image, and a reference picture with known geographic position information is obtained. And shooting by using the intelligent road unmanned aerial vehicle carrying the infrared camera in the visual navigation area to obtain a real-time infrared image of forward looking.
S2, adopting a cross-mode image style migration network structure to perform style migration on the visible light image to obtain a pseudo infrared image;
step S3, intelligent matching is carried out on the real-time infrared image and the pseudo infrared image by adopting a transducer intelligent matching method, and a characteristic point matching relation is obtained;
s4, determining a homography transformation matrix according to the characteristic point matching relation;
s5, performing perspective transformation on a central point of the real-time infrared image according to the homography transformation matrix, and determining a pixel point corresponding to the central point in the pseudo-infrared image; here, the center point of the real-time infrared image is the current position of the drone.
Step S6, mapping the pixel points corresponding to the center points in the pseudo infrared image to the visible light image, and determining mapping points in the visible light image;
and S7, obtaining a geographic positioning result of the unmanned aerial vehicle according to the geographic position information corresponding to the mapping points in the visible light image. Here, the geographic position information corresponding to the mapping points is the geographic positioning result of the unmanned aerial vehicle; the geographical location information corresponding to each pixel point in the visible light image is known, so that a final geographical location result of the unmanned aerial vehicle can be obtained.
In one embodiment, the cross-modality image style migration network structure is a CycleGAN network structure, and the total loss function is:
wherein,,Gas a result of the fact that the first generator,Ffor the second generator, X is the source domain, Y is the target domain,for the first arbiter, ++>Is a second discriminator; />For the total loss function->For the first generatorGAnd a first discriminator->A contrast loss function between->For the second generator F and the second arbiter +.>A contrast loss function between->For the cyclic consistency loss function +.>As a function of the loss of identity of the entity,weight coefficient for cyclic consistency loss function, < ->Weight coefficient for the ontology consistency loss function by adjusting +.>Andbetter results can be obtained.
In this embodiment, a cross-modal image style migration network structure may be used to implement style migration from a visible image to an infrared image, and fig. 2 shows an image mode conversion schematic diagram. The CycleGAN network structure is a ring structure composed of two opposite GAN networks, and can realize the image in the source domain(visible light image field) and target field->The inter-conversion between (infrared image fields) fig. 3 shows a schematic diagram of a CycleGAN network structure, see fig. 3, which mainly comprises a first generatorGA second generatorFFirst discriminator->And a second discriminator->In the figure, ->For the target domain->Is>Is the source domain->Is a real image of (a); first generator->For attaching images from the source domain->Mapping to the target Domain->Conversely, the second generator->For +_target field>Image conversion to Source Domain->
The traditional CycleGAN network structure adoptsThe loss function is calculated and the generated loss is calculated using a Mean Square Error (MSE). The square operation due to MSE will be amplified more (+)>) And therefore outliers can significantly affect the prediction results, ultimately reducing the overall performance of the model. In addition, if the initial output value is larger, the gradient update amplitude of the MSE loss function is smaller, so that the convergence time is long, and the model training is unstable. In view of the above, in this embodiment, the total loss function includes three parts, and besides the countermeasures loss and the cycle consistency loss in the original CycleGAN, the body consistency loss is constructed to maintain the hue of the image and prevent the overall color of the image from changing.
Specifically, the countering loss is a game between a generator and a arbiter, two countering losses are designed, a first generator G and a first arbiterBetween the antagonism loss function->The following formula is used:
wherein,,for the real image of the target domain +.>Is the real image of the source domain, +.>For input of +.>Time first generatorGAn output of (2);Eis the expected value; />For input of +.>First discriminator->Output of->For input of +.>First discriminator->An output of (2);
second generator F and second discriminatorBetween the antagonism loss function->The following formula is used:
wherein,,for input of +.>The output of the second generator F; />For input of +.>Second discriminator->Output of->Input is +.>Second discriminator->An output of (2);
the cycle consistency loss is to ensure that the input image can be infinitely close to the original image after the image conversion period, namely a forward cycle loopAnd a reverse circulation loop->. The total loop consistency penalty exploits the +_ between reconstructed and true images>Calculating the distance, and circulating the consistency loss functionThe following formula is used:
wherein,,for input of +.>Time second generatorFOutput of->For input of +.>Time first generatorGAnd the outputs of (a) represent reconstructed images of the forward loop and the reverse loop, respectively.
The ontology consistency penalty is used to constrain the retention of image colors by the generator, prevent the generated image from changing in hue, and ensure that the generated image retains the color configuration of the original image. The loss of identity of the body can ensure the imageFeeding into a first generator->Or image->Into the second generator->After this, the output remains itself. Ontology consistency penalty function->The following formula is used:
wherein,,for input of +.>Time first generatorGOutput of->For input of +.>Time second generatorFIs provided.
The optimized objective function expression is:
wherein,,is thatGIs the optimal solution of->Is thatFIs a solution to the optimization of (3).
Before the cross-modal image style migration network structure is applied to perform style migration on the visible light image, training is required to be performed on the cross-modal image style migration network structure, and the specific training process is as follows:
the training process uses an open source RGB-NIR scene dataset containing 9 pairs of visible and near infrared images, totaling 954 images. Before the image is input into the network, it is first normalized to 256×256 size.
In the training process, the iteration number (epoch) is set to 200, and the batch is set to 1. For the first 100 epochs, the learning rate was set to 0.0002, and the last 100 epochs adaptively adjusted the learning rate using an Adam optimizer. To enhance the conversion effect of the visible light image to the infrared image, the method comprises the following steps ofThe forward reconstruction loss weight of the domain is set to 30 +.>The reverse reconstruction loss weight of the domain is set to 10, strengthening the generator +.>Importance of (3). Furthermore, loop consistency loss->Loss of identity with ontology>Weight coefficient ratio of->Set to->
In the embodiment, the cross-mode images with larger feature differences are converted into the same feature domain by using the generated countermeasure network, so that the problem of mismatching caused by the imaging differences of the cross-mode images is solved; meanwhile, the total loss function constructs the body consistency loss besides the antagonism loss and the cycle consistency loss in the original CycleGAN network, and accelerates the convergence of the network.
In one embodiment, in step S3, a transform intelligent matching method is used to intelligently match the real-time infrared image and the pseudo-infrared image to obtain a feature point matching relationship, which includes:
step S31, adopting a twin network in a ResNet50 backbone network to respectively perform feature extraction on the real-time infrared image and the pseudo-infrared image to obtain two feature images, which are marked as I and IAnd splicing the two feature images to obtain a spliced feature image.
Here, the input image is first resized to 256×256, and a network of res net50 backbone is used as a feature extraction network of the twin network to obtain two feature maps with 1024 channels, and then the feature maps are mapped and projected to the feature maps with 16×16×256, so as to reduce the computation amount of the transducer. The two feature maps are then stitched to form a stitched feature map of size 16 x 32 x 256.
And step S32, adding position codes into the spliced feature images to obtain the context feature images. Here, the feature map of 16×32×256 is added to the position code, and a context feature map of 16×32×256 is generated.
In step S33, the query point is input to the transform encoder-decoder structure together with the context feature map, resulting in a high-dimensional vector.
Here, in the transducer encoder-decoder structure, the encoder and decoder are each 6 layers, each layer containing an 8-head self-attention module. The query point is the normalized coordinate position in the feature map I. The transform encoder-decoder structure outputs 256-dimensional vectors.
And step S34, inputting the high-dimensional vector into the multi-layer perceptron to obtain the characteristic point matching relationship. Here, the high-dimensional vector is input into the multi-layer perceptron, and the feature map is outputAnd the feature points matched with the query points form a feature point matching relationship.
With the same inventive concept as the trans-former-based trans-modal image matching and positioning method, the present embodiment further provides a trans-former-based trans-modal image matching and positioning device corresponding thereto, and fig. 4 shows a block diagram of a trans-former-based trans-modal image matching and positioning device according to an embodiment of the present application, including:
an image acquisition module 41, configured to acquire a real-time infrared image and a visible light image under a viewing angle of the unmanned aerial vehicle;
the image style migration module 42 is configured to perform style migration on the visible light image by using a cross-mode image style migration network structure, so as to obtain a pseudo infrared image;
the intelligent matching module 43 is configured to perform intelligent matching on the real-time infrared image and the pseudo-infrared image by using a transform intelligent matching method, so as to obtain a feature point matching relationship;
a homography transformation matrix determining module 44, configured to determine a homography transformation matrix according to the feature point matching relationship;
the perspective transformation module 45 is used for performing perspective transformation on the center point of the real-time infrared image according to the homography transformation matrix, and determining a pixel point corresponding to the center point in the pseudo infrared image;
the mapping module 46 is configured to map a pixel point corresponding to the center point in the pseudo-infrared image onto the visible light image, and determine a mapping point in the visible light image;
the positioning result determining module 47 is configured to obtain a geographic positioning result of the unmanned aerial vehicle according to the geographic location information corresponding to the mapping point in the visible light image.
In the above embodiment, the specific implementation function of each module is consistent with the specific implementation manner of the foregoing method embodiment, and will not be described in detail.
In order to further verify the effectiveness of the positioning method of the application, the positioning method of the application is compared with the existing method, and fig. 5 shows a comparison chart of the positioning method of the application and the positioning result of the existing method, and experiments prove that the method of the application effectively realizes the cross-mode conversion from a visible light image to an infrared image, remarkably improves the successful matching rate, and has good matching effect on the problems of large cross-mode image mode difference, large matching difficulty, poor robustness and the like, and the method of the application has significance and value in practical engineering application.
The foregoing is merely various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (4)

1. A trans-former-based cross-mode image matching and positioning method is characterized by comprising the following steps of:
acquiring a real-time infrared image and a visible light image under the view angle of the unmanned aerial vehicle;
adopting a cross-mode image style migration network structure to perform style migration on the visible light image to obtain a pseudo infrared image;
performing intelligent matching on the real-time infrared image and the pseudo infrared image by adopting a Transformer intelligent matching method to obtain a characteristic point matching relationship;
determining a homography transformation matrix according to the characteristic point matching relation;
performing perspective transformation on a central point of the real-time infrared image according to the homography transformation matrix, and determining a pixel point corresponding to the central point in the pseudo infrared image;
mapping pixel points corresponding to the center point in the pseudo infrared image to the visible light image, and determining mapping points in the visible light image;
obtaining a geographic positioning result of the unmanned aerial vehicle according to the geographic position information corresponding to the mapping points in the visible light image;
the cross-mode image style migration network structure is a CycleGAN network structure, and the total loss function is as follows:
wherein,,Gas a result of the fact that the first generator,Ffor the second generator, X is the source domain, Y is the target domain,for the first arbiter, ++>Is a second discriminator; />For the total loss function->For the first generatorGAnd a first discriminatorA contrast loss function between->For the second generator F and the second arbiter +.>A contrast loss function between->For the cyclic consistency loss function +.>For the ontology consistency loss function, +.>Weight coefficient for cyclic consistency loss function, < ->Weight coefficients for the ontology consistency loss function;
the method for intelligently matching the real-time infrared image and the pseudo infrared image by adopting a Transformer intelligent matching method to obtain a characteristic point matching relationship comprises the following steps:
the method comprises the steps of respectively carrying out feature extraction on the real-time infrared image and the pseudo infrared image by adopting a twin network in a ResNet50 backbone network to obtain two feature images, and splicing the two feature images to obtain a spliced feature image;
adding position codes into the spliced feature images to obtain a context feature image;
inputting the query points and the context feature map into a transducer encoder-decoder structure together to obtain a high-dimensional vector;
and inputting the high-dimensional vector into a multi-layer perceptron to obtain the characteristic point matching relation.
2. The method of claim 1, wherein the first generatorGAnd a first discriminatorBetween the antagonism loss function->The following formula is used:
wherein,,for the real image of the target domain +.>Is the real image of the source domain, +.>For input of +.>Time first generatorGAn output of (2);Eis the expected value; />For input of +.>First discriminator->Output of->For input of +.>First discriminator->An output of (2);
a second generatorFAnd a second discriminatorBetween the antagonism loss function->The following formula is used:
wherein,,for input of +.>The output of the second generator F; />For input of +.>Second discriminator->Output of->Input is +.>Second discriminator->An output of (2);
cyclic consistency loss functionThe following formula is used:
wherein,,for input of +.>Time second generatorFOutput of->For input of +.>Time first generatorGAn output of (2);
ontology consistency loss functionThe following formula is used:
wherein,,for input of +.>Time first generatorGOutput of->For input of +.>Time second generatorFIs provided.
3. Transformer-based cross-mode image matching and positioning device is characterized by comprising the following components:
the image acquisition module is used for acquiring real-time infrared images and visible light images under the view angle of the unmanned aerial vehicle;
the image style migration module is used for performing style migration on the visible light image by adopting a cross-mode image style migration network structure to obtain a pseudo infrared image;
the intelligent matching module is used for intelligently matching the real-time infrared image and the pseudo infrared image by adopting a Transformer intelligent matching method to obtain a characteristic point matching relationship;
the homography transformation matrix determining module is used for determining a homography transformation matrix according to the characteristic point matching relation;
the perspective transformation module is used for carrying out perspective transformation on the central point of the real-time infrared image according to the homography transformation matrix, and determining the pixel point corresponding to the central point in the pseudo infrared image;
the mapping module is used for mapping the pixel points corresponding to the center point in the pseudo infrared image to the visible light image and determining mapping points in the visible light image;
the positioning result determining module is used for obtaining the geographic positioning result of the unmanned aerial vehicle according to the geographic position information corresponding to the mapping points in the visible light image;
the cross-mode image style migration network structure is a CycleGAN network structure, and the total loss function is as follows:
wherein,,Gas a result of the fact that the first generator,Ffor the second generator, X is the source domain, Y is the target domain,for the first arbiter, ++>Is a second discriminator; />For the total loss function->For the first generatorGAnd a first discriminatorA contrast loss function between->For the second generator F and the second arbiter +.>A contrast loss function between->For the cyclic consistency loss function +.>For the ontology consistency loss function, +.>Weight coefficient for cyclic consistency loss function, < ->Weight coefficients for the ontology consistency loss function;
the intelligent matching module is also used for:
the method comprises the steps of respectively carrying out feature extraction on the real-time infrared image and the pseudo infrared image by adopting a twin network in a ResNet50 backbone network to obtain two feature images, and splicing the two feature images to obtain a spliced feature image;
adding position codes into the spliced feature images to obtain a context feature image;
inputting the query points and the context feature map into a transducer encoder-decoder structure together to obtain a high-dimensional vector;
and inputting the high-dimensional vector into a multi-layer perceptron to obtain the characteristic point matching relation.
4. The apparatus of claim 3, wherein the first generatorGAnd a first discriminatorBetween the antagonism loss function->The following formula is used:
wherein,,for the real image of the target domain +.>Is the real image of the source domain, +.>For input of +.>Time first generatorGAn output of (2);Eis the expected value; />For input of +.>First discriminator->Output of->For input of +.>First discriminator->An output of (2);
second generator F and second discriminatorBetween the antagonism loss function->The following formula is used:
wherein,,for input of +.>The output of the second generator F; />For input of +.>Second discriminator->Output of->Input is +.>Second discriminator->An output of (2);
cyclic consistency loss functionThe following formula is used:
wherein,,for input of +.>Time second generatorFOutput of->For input of +.>Time first generatorGAn output of (2);
ontology consistency loss functionThe following formula is used:
wherein,,for input of +.>Time first generatorGOutput of->For input of +.>Time second generatorFIs provided.
CN202310450328.2A 2023-04-25 2023-04-25 Transformer-based cross-mode image matching and positioning method and device Active CN116168221B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310450328.2A CN116168221B (en) 2023-04-25 2023-04-25 Transformer-based cross-mode image matching and positioning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310450328.2A CN116168221B (en) 2023-04-25 2023-04-25 Transformer-based cross-mode image matching and positioning method and device

Publications (2)

Publication Number Publication Date
CN116168221A CN116168221A (en) 2023-05-26
CN116168221B true CN116168221B (en) 2023-07-25

Family

ID=86411762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310450328.2A Active CN116168221B (en) 2023-04-25 2023-04-25 Transformer-based cross-mode image matching and positioning method and device

Country Status (1)

Country Link
CN (1) CN116168221B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117994596B (en) * 2024-04-07 2024-06-04 四川大学华西医院 Intestinal ostomy image recognition and classification system based on twin network
CN118038499A (en) * 2024-04-12 2024-05-14 北京航空航天大学 Cross-mode pedestrian re-identification method based on mode conversion

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115331029A (en) * 2022-08-19 2022-11-11 西安电子科技大学 Heterogeneous image matching method based on cross-mode conversion network and optimal transmission theory

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3103048B1 (en) * 2019-11-07 2021-10-22 Thales Sa PROCESS AND DEVICE FOR GENERATING SYNTHETIC LEARNING DATA FOR ARTIFICIAL INTELLIGENCE MACHINE FOR AIRCRAFT LANDING AID
CN111062905B (en) * 2019-12-17 2022-01-04 大连理工大学 Infrared and visible light fusion method based on saliency map enhancement
CN111723780B (en) * 2020-07-22 2023-04-18 浙江大学 Directional migration method and system of cross-domain data based on high-resolution remote sensing image
CN112149635A (en) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 Cross-modal face recognition model training method, device, equipment and storage medium
US11295477B1 (en) * 2021-05-19 2022-04-05 Motional Ad Llc Deep learning-based camera calibration
CN114529593A (en) * 2022-01-12 2022-05-24 西安电子科技大学 Infrared and visible light image registration method, system, equipment and image processing terminal
CN114417048A (en) * 2022-01-17 2022-04-29 中国计量大学 Unmanned aerial vehicle positioning method without positioning equipment based on image semantic guidance

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115331029A (en) * 2022-08-19 2022-11-11 西安电子科技大学 Heterogeneous image matching method based on cross-mode conversion network and optimal transmission theory

Also Published As

Publication number Publication date
CN116168221A (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN116168221B (en) Transformer-based cross-mode image matching and positioning method and device
CN108303099B (en) Autonomous navigation method in unmanned plane room based on 3D vision SLAM
CN109509230B (en) SLAM method applied to multi-lens combined panoramic camera
Yuan et al. RGGNet: Tolerance aware LiDAR-camera online calibration with geometric deep learning and generative model
CN108534782B (en) Binocular vision system-based landmark map vehicle instant positioning method
CN110827415A (en) All-weather unknown environment unmanned autonomous working platform
CN104732518B (en) A kind of PTAM improved methods based on intelligent robot terrain surface specifications
CN104376552B (en) A kind of virtual combat method of 3D models and two dimensional image
CN108010085A (en) Target identification method based on binocular Visible Light Camera Yu thermal infrared camera
CN104835158B (en) Based on the three-dimensional point cloud acquisition methods of Gray code structured light and epipolar-line constraint
CN111536970B (en) Infrared inertial integrated navigation method for low-visibility large-scale scene
US11948309B2 (en) Systems and methods for jointly training a machine-learning-based monocular optical flow, depth, and scene flow estimator
CN114719848B (en) Unmanned aerial vehicle height estimation method based on vision and inertial navigation information fusion neural network
CN113298947A (en) Multi-source data fusion-based three-dimensional modeling method medium and system for transformer substation
CN117253029B (en) Image matching positioning method based on deep learning and computer equipment
Hu et al. Aerial monocular 3d object detection
CN114943757A (en) Unmanned aerial vehicle forest exploration system based on monocular depth of field prediction and depth reinforcement learning
CN105389819B (en) A kind of lower visible image method for correcting polar line of half calibration and system of robust
Zhang et al. RI-LIO: reflectivity image assisted tightly-coupled LiDAR-inertial odometry
CN116740488B (en) Training method and device for feature extraction model for visual positioning
Mithun et al. Cross-view visual geo-localization for outdoor augmented reality
CN116824433A (en) Visual-inertial navigation-radar fusion self-positioning method based on self-supervision neural network
CN108460829B (en) A kind of 3-D image register method for AR system
Guo et al. Unsupervised Multi-Spectrum Stereo Depth Estimation for All-Day Vision
CN108955687A (en) The synthesized positioning method of mobile robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant