CN116596815A - Image stitching method based on multi-stage alignment network - Google Patents

Image stitching method based on multi-stage alignment network Download PDF

Info

Publication number
CN116596815A
CN116596815A CN202310517330.7A CN202310517330A CN116596815A CN 116596815 A CN116596815 A CN 116596815A CN 202310517330 A CN202310517330 A CN 202310517330A CN 116596815 A CN116596815 A CN 116596815A
Authority
CN
China
Prior art keywords
image
aligned
representing
target image
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310517330.7A
Other languages
Chinese (zh)
Inventor
范晓婷
张重
徐敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Normal University
Original Assignee
Tianjin Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Normal University filed Critical Tianjin Normal University
Priority to CN202310517330.7A priority Critical patent/CN116596815A/en
Publication of CN116596815A publication Critical patent/CN116596815A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an image stitching method based on a multi-stage alignment network, which comprises the following steps: step S1, acquiring a training data set, wherein the training data set comprises a plurality of input image pairs and image stitching results corresponding to each input image pair, and the input image pairs comprise a reference image I 1 And target image I 2 The method comprises the steps of carrying out a first treatment on the surface of the S2, constructing an image stitching depth model; step S3, training the image stitching depth model based on the training data set and the overall loss function to obtain a target image stitching depth model; and S4, splicing the images to be spliced by using the target image splicing depth model to obtain an image splicing result. The invention utilizes multi-stage alignment to carry out depth optimization deformation on the image, reduces the distortion of the image content and simultaneously protects the image contentAnd the holding seam is smooth, so that the accurate splicing of the images is realized.

Description

Image stitching method based on multi-stage alignment network
Technical Field
The invention relates to the field of image processing and deep learning, in particular to an image stitching method based on a multi-stage alignment network.
Background
As a key technique for obtaining a high-resolution wide-field panoramic image, image stitching aims at obtaining a plurality of images with overlapping areas by rotating a camera, and stitching the images by feature matching and image fusion. However, when the rotation angle of the image capturing device is large or the photographed scenes are not coplanar, visual artifacts and misalignment of the stitched images may occur. Thus, how to ensure accurate alignment and natural smoothing of wide field-of-view panoramic images is a challenging problem in image stitching.
In recent years, a large number of image stitching methods have been proposed by researchers. The conventional image stitching method is classified into a global alignment method and a spatial variation deformation method. The global alignment method utilizes the invariant local feature matching image and establishes a mapping relation alignment image through a homography matrix, for example: a dual-homography estimation method, a smooth variation affine method, and the like. The spatial variation deformation method is to divide an image into uniform grids, and obtain optimal grid coordinates by optimizing a content-based grid deformation function, and includes a projection method as much as possible, a natural self-adaptive method as much as possible, and the like. In recent years, researchers have proposed image stitching methods based on deep learning to improve stitching performance. For example, nie et al propose an image stitching network based on global homography and eliminate image artifacts by constructing a structure stitching stage and a content modification stage. In view of the importance of edge preservation, dai et al propose an edge-guided fusion-based approach for image stitching. Jong et al devised a depth image squaring solution for preserving the linear and nonlinear structure of the image. However, the performance of these image stitching methods still needs to be further improved.
In carrying out the invention, the inventors have found that at least the following drawbacks and deficiencies in the prior art are present:
the methods in the prior art generally align images by estimating a single depth map transform, cannot effectively process large parallax scenes, and may distort the global structure of the panoramic image; the existing method ignores the importance of the image content and the splicing seams in the image splicing process, and easily causes the problems of inconsistent image content and discontinuous splicing seams.
Disclosure of Invention
The invention designs an image splicing method based on a multi-stage alignment network. The invention uses a depth homography estimation module based on content retention to pre-align an input image pair and reduce content artifacts, uses a grid deformation module based on edge assistance to further align the image pair and avoid joint distortion, constructs content consistency loss and joint smoothness loss to maintain the geometric structure of the image pair and reduce joint discontinuity of an overlapping area, and further predicts high-quality image splicing results. The image splicing method realizes high-quality splicing of images, avoids content artifacts and reduces joint distortion.
The image stitching method based on the multi-stage alignment network provided by the invention comprises the following steps:
step S1, acquiring a training data set, wherein the training data set comprises a plurality of input image pairs and image stitching results corresponding to each input image pair, and the input image pairs comprise a reference image I 1 And target image I 2
S2, constructing an image stitching depth model;
step S3, training the image stitching depth model based on the training data set and the overall loss function to obtain a target image stitching depth model;
and S4, splicing the images to be spliced by using the target image splicing depth model to obtain an image splicing result.
Optionally, the image stitching depth model includes an image pre-alignment sub-model for pre-aligning the input image pair using a content-preserving based depth homography estimation module and an image alignment sub-model for further aligning the pre-aligned input image pair using an edge-aided network.
Optionally, the depth homography estimation module is formed by connecting a plurality of symmetrical convolution layer units and a corresponding number of content-holding attention modules in a staggered manner, wherein the symmetrical convolution layer units comprise two convolution layers and one maximum pooling layer; the content holding attention module comprises a spatial attention module and a plurality of cross operation modules, wherein the spatial attention module comprises two maximum pooling layers, two average pooling layers, a shared full connection layer and an activation function layer.
Optionally, the edge-assist network includes a convolutional layer, three multi-scale residual blocks, an upsampling layer, and a bottleneck layer.
Optionally, when the input image pair is pre-aligned using the image pre-alignment sub-model:
inputting the input image pair into the depth homography estimation module to obtain a reference image I in the input image pair 1 And target image I 2 Corresponding output characteristic diagram F i R and Fi T
Based on the output characteristic diagram F i R and Fi T Obtaining a homography matrix by using a direct linear transformation method;
reference image I is respectively processed by using a spatial deformation network 1 And target image I 2 Deforming to realize reference image I 1 And target image I 2 Pre-alignment of pixel locations of overlapping regions, wherein the pre-aligned input image pair is represented as:
wherein E represents an identity matrix, H represents a homography matrix, W STN (. Cndot. Cndot.) represents the output of the spatially deformed network.
Optionally, when aligning the pair of pre-aligned input images using the image alignment submodel:
obtaining a basic feature map of the pre-aligned reference image and the target image by using a convolution layer in the edge auxiliary network;
extracting and obtaining edge feature images of the pre-aligned reference image and the target image by using the edge auxiliary network;
respectively cascading the obtained edge feature images of the pre-aligned reference image and the target image with corresponding basic feature images to obtain a fusion feature image of the pre-aligned reference image and the target image;
calculating to obtain feature flows of the pre-aligned reference image and the target image by using a context correlation method based on the fusion feature map of the pre-aligned reference image and the target image;
the pre-aligned reference image and the target image and the corresponding characteristic flow thereof are sent into a depth grid deformation network to obtain an aligned reference imageAnd target image->Wherein the aligned reference pictures +.>And target image->Expressed as:
wherein
wherein ,F1conv and F2conv Respectively representing the basic feature maps of the pre-aligned image pairs, F 1edge and F2edge Respectively represent pre-formsAligning edge feature maps of image pairs, F 1c and F2c Fusion feature maps respectively representing pre-aligned image pairs, [ ·, · ]]Representing tandem operation, CCL (·, ·) representing context dependent method, W mesh (. Cndot.). Cndot.represents a deep mesh morphing network,representing pre-aligned reference pictures, +.>Representing a pre-aligned target image.
Optionally, the overall loss function includes a content consistency loss and a seam smoothness loss, the overall loss function L All Expressed as:
L All =αL cont +βL seam
wherein ,Lcont Representing content consistency loss, L seam Representing the joint smoothness loss, α and β are weights for the content consistency loss and the joint smoothness loss, respectively.
Optionally, the content consistency loss consists of a photometric loss term and a structural loss term.
Optionally, the luminosity loss term L photo Expressed as:
L photo =||I F -I G || 1
wherein ,IF and IG Respectively representing a final image splicing result and a true value 1 Represents an L1 norm;
the structure loss term L struc Expressed as:
wherein ,representing conv1 in VGG-16 networks i Is a function of (a) and (b), I.I 2 Representing the L2 norm.
Optionally, the seam smoothness loss L seam Expressed as:
L seam =||E 1 -E 1G || 1 +||E 2 -E 2G || 1
wherein
wherein ,E1 and E2 Is an edge image of the aligned image pair, E 1G and E2G Is the true value of the edge image of the aligned image pair obtained by using the curvature formula, E net (·) represents the output of the edge assist network,reference image representing alignment, +.>Representing aligned target images, m and n representing horizontal and vertical directions, +.>And div (·) represent the gradient and divergence operations, respectively.
The technical scheme provided by the invention has the beneficial effects that:
1. the invention can accurately align the images, reduce the distortion of the image content and simultaneously keep the joints smooth, and obtain the high-quality image splicing result.
2. The invention solves the problem of image stitching by using a deep learning technology, reduces image alignment artifacts by using a multi-stage alignment mode, and reduces seam discontinuity by using edge information assistance and seam smoothness loss.
Drawings
FIG. 1 is a flow chart of a method for image stitching based on a multi-stage alignment network according to an embodiment of the present invention;
FIG. 2 (a) is a block diagram according to the present inventionA schematic diagram of a content-holding attention module structure of an embodiment of the invention, wherein,representing pixel level multiplication;
fig. 2 (b) is a schematic view of a spatial attention module structure according to an embodiment of the present invention, wherein,representing pixel level addition, +.>Representing a sigmoid function;
FIG. 3 is a schematic diagram of an edge assist network architecture according to an embodiment of the invention;
fig. 4 is a schematic diagram showing structural similarity comparison results of different image stitching methods according to an embodiment of the present invention.
Detailed Description
The objects, technical solutions and advantages of the present invention will become more apparent by the following detailed description of the present invention with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
Fig. 1 is a flowchart of an image stitching method based on a multi-stage alignment network according to an embodiment of the present invention, and some specific implementation procedures of the present invention are described below by taking fig. 1 as an example. The image stitching method based on the multi-stage alignment network provided by the invention comprises the following steps:
step S1, acquiring a training data set, wherein the training data set comprises a plurality of input image pairs and image stitching results corresponding to each input image pair, and the input image pairs comprise a reference image I 1 And target image I 2
S2, constructing an image stitching depth model;
in an embodiment of the present invention, the image stitching depth model includes an image pre-alignment sub-model and an image alignment sub-model.
The image pre-alignment sub-model is used for pre-aligning the input image pair by utilizing a depth homography estimation module based on content preservation so as to reduce content artifacts and obtain a pre-aligned input image pair.
Further, the depth homography estimation module is formed by connecting a plurality of symmetrical convolution layer units and a corresponding number of content-holding attention modules in a staggered manner, namely, the content-holding attention modules are arranged between every two symmetrical convolution layer units so as to find correct matching features and reduce wrong matching features, as shown in fig. 2 (a). Wherein the symmetrical convolution layer unit comprises two convolution layers and a Maxpooling layer; the content holding attention module comprises a spatial attention module and a plurality of cross operation modules, as shown in fig. 2 (a); the spatial attention module includes two Maxpooling layers, two Avgpooling layers, one shared full connection layer and one sigmoid layer, as shown in fig. 2 (b).
Assuming that the depth homography estimation module includes i+1 symmetric convolution layer units and i+1 content-holding attention modules, for the depth homography estimation module, the reference image I 1 And target image I 2 After being sent to a first-stage symmetrical convolution layer unit in the depth homography estimation module, a reference image I can be generated 1 And target image I 2 Is a primary feature map of (1) and />After passing through the content holding attention module between the first-stage symmetrical convolution layer unit and the second-stage symmetrical convolution layer unit, the reference image I can be obtained 1 And target image I 2 Corresponding first-level weighted feature diagram F 0 R and F0 T The method comprises the steps of carrying out a first treatment on the surface of the Warp yarnAfter passing through the second-stage symmetrical convolution layer unit, a second-stage characteristic diagram can be obtained> and />After passing through the content holding attention module between the second-level symmetrical convolution layer unit and the third-level symmetrical convolution layer unit, the reference image I can be obtained 1 And target image I 2 Corresponding two-level weighted feature map F 1 R and F1 T The method comprises the steps of carrying out a first treatment on the surface of the And so on, after passing through the (i+1) th content-holding attention module connected with the (i+1) th symmetrical convolution layer unit of the last stage, the (i+1) th symmetrical convolution layer unit, the (i+1) th content-holding attention module can obtain the reference image I 1 And target image I 2 Corresponding output characteristic diagram F i R and Fi T As shown in FIG. 2 (a), the output characteristic diagram F i R and Fi T Can be expressed as:
wherein
wherein , and />Level i+1 feature map representing reference image and target image, respectively,/for each of the images> and />Respectively representing the spatial level feature map of the reference image and the target image obtained by multiplying the i+1st level feature map and the corresponding spatial attention mask, namely the output of the corresponding spatial attention module, pixel by pixel, M s (. Cndot.) represents the spatial attention mask, (. Cndot.)>Representing pixel level multiplication.
When the image pre-alignment submodel is utilized to pre-align an input image pair:
firstly, inputting the input image pair into the depth homography estimation module to obtain a reference image I in the input image pair 1 And target image I 2 Corresponding output characteristic diagram F i R and Fi T
Then, based on the output feature map F i R and Fi T The homography matrix is obtained by using a direct linear transformation method, wherein the homography matrix is obtained by using the direct linear transformation method, which belongs to the technology which should be mastered by the person skilled in the art, and the invention is not repeated;
then, the reference images I are respectively processed by using a spatial deformation network 1 And target image I 2 Performing deformation, i.e. realizing reference image I 1 And target image I 2 Pre-alignment of pixel locations of overlapping regions, wherein the pre-aligned input image pair may be represented as:
wherein E represents an identity matrix, H represents a homography matrix, W STN (. Cndot. Cndot.) represents the output of the spatially deformed network.
Wherein the image alignment sub-model is configured to further align the pre-aligned input image pair using an edge-aided network to reduce joint distortion.
Further, the edge-assisted network is an edge-based mesh morphing module that includes a convolutional layer, three multi-scale residual blocks, an upsampling layer, and a bottleneck layer, as shown in fig. 3.
When the image alignment submodel is utilized to align the pre-aligned input image pair:
firstly, a basic feature map of a prealigned reference image and a target image is obtained by utilizing a convolution layer in the edge auxiliary network;
then, extracting and obtaining an edge feature map of the pre-aligned reference image and the target image by using the edge auxiliary network;
then, the obtained edge feature images of the pre-aligned reference image and the target image are respectively cascaded with corresponding basic feature images to obtain a fusion feature image of the pre-aligned reference image and the target image;
then, calculating to obtain feature flows of the pre-aligned reference image and the target image by using a context correlation method based on the fusion feature map of the pre-aligned reference image and the target image;
finally, the prealigned reference image, the target image and the corresponding characteristic streams are sent into a depth grid deformation network to obtain an aligned reference imageAnd target image->Wherein the aligned reference pictures +.>And target image->Can be expressed as:
wherein
wherein ,F1conv and F2conv Respectively representing the basic feature maps of the pre-aligned image pairs, F 1edge and F2edge Edge feature maps, F, respectively representing pairs of pre-aligned images 1c and F2c Fusion feature maps respectively representing pre-aligned image pairs, [ ·, · ]]Representing tandem operation, CCL (·, ·) representing context dependent method, W mesh (. Cndot.) represents the output of the deep mesh morphing network,representing pre-aligned reference pictures, +.>Representing a pre-aligned target image.
Step S3: training the image stitching depth model based on the training data set and the overall loss function to obtain a target image stitching depth model;
in one embodiment of the invention, the overall loss function includes a content consistency loss and a seam smoothness loss to preserve the geometry of the image pair and reduce seam discontinuities in the overlapping region using the content consistency loss and the seam smoothness loss.
Wherein the overall loss function L All Can be expressed as:
L All =αL cont +βL seam
wherein ,Lcont Representing content consistency loss, L seam Representing the joint smoothness loss, α and β are weights for the content consistency loss and the joint smoothness loss, respectively.
In an embodiment of the present invention, the weights α and β may each be set to 0.5.
Wherein the content consistency loss consists of a photometric loss term for minimizing pixel differences between the image stitching result and the true value and a structural loss term for constraining the image stitching result and the true value to have similar characteristic representations;
further, the luminosity loss term L photo Can be expressed as:
L photo =||I F -I G || 1
wherein ,IF and IG Respectively representing a final image splicing result and a true value 1 Represents an L1 norm;
the structure loss term L struc Can be expressed as:
wherein ,representing conv1 in VGG-16 networks i Is a function obtainable by a person skilled in the art, conv1 1 and conv12 The receptive field of each pixel in (1) is a 5×5 neighborhood, |·|| 2 Representing the L2 norm.
Wherein the seam smoothness loss is used to constrain the edge image of the aligned image pair to be close to the edge image realism value of the aligned image pair.
Further, the joint smoothness loss L seam Can be expressed as:
L seam =||E 1 -E 1G || 1 +||E 2 -E 2G || 1
wherein
wherein ,E1 and E2 Is an edge image of the aligned image pair, E 1G and E2G Is the true value of the edge image of the aligned image pair obtained by using the curvature formula, E net (. Cndot.) represents the output of the edge assist network, m and n represent the horizontal and vertical directions,and div (·) represent the gradient and divergence operations, respectively.
And S4, splicing the images to be spliced by using the target image splicing depth model to obtain a high-quality image splicing result.
Fig. 4 shows structural similarity comparison results of image stitching results obtained by different methods, and the comparison algorithm includes: the method of Zaragaza and the method of Zhao, wherein the method of Zaragaza is a traditional image stitching method and the method of Zhao is an image stitching method based on deep learning. The greater the structural similarity, the higher the quality of the image stitching result. As can be seen from fig. 4, the structural similarity of the present invention is greater than that of the zaagaoza method, illustrating the important role of the content-preserving-based depth homography model in image stitching. In addition, the Zhao method also performs worse than the present invention in terms of structural similarity. The main reason is that the Zhao method uses only a single depth homography for image alignment, which can produce undesirable alignment distortion, and thus lead to seam discontinuities. In contrast, the invention reduces the content artifacts and joint distortions of the image stitching results by constructing a multi-stage alignment model and combining the content consistency loss and joint smoothness loss.
The embodiment of the invention does not limit the types of other devices except the types of the devices, so long as the devices can complete the functions.
Those skilled in the art will appreciate that the drawings are schematic representations of only one preferred embodiment, and that the above-described embodiment numbers are merely for illustration purposes and do not represent advantages or disadvantages of the embodiments.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (10)

1. An image stitching method based on a multi-stage alignment network, comprising the steps of:
step S1, acquiring a training data set, wherein the training data set comprises a plurality of input image pairs and image stitching results corresponding to each input image pair, and the input image pairs comprise a reference image I 1 And target image I 2
S2, constructing an image stitching depth model;
step S3, training the image stitching depth model based on the training data set and the overall loss function to obtain a target image stitching depth model;
and S4, splicing the images to be spliced by using the target image splicing depth model to obtain an image splicing result.
2. The method of claim 1, wherein the image stitching depth model comprises an image pre-alignment sub-model for pre-aligning the input image pair using a content-preserving based depth homography estimation module and an image alignment sub-model for further aligning the pre-aligned input image pair using an edge-aided network.
3. The method of claim 2, wherein the depth homography estimation module is formed by interleaving a plurality of symmetric convolution layer units and a corresponding number of content-holding attention modules, wherein the symmetric convolution layer units include two convolution layers and a maximum pooling layer; the content holding attention module comprises a spatial attention module and a plurality of cross operation modules, wherein the spatial attention module comprises two maximum pooling layers, two average pooling layers, a shared full connection layer and an activation function layer.
4. The method of claim 2, wherein the edge-assisted network comprises a convolutional layer, three multi-scale residual blocks, an upsampling layer, and a bottleneck layer.
5. The method of claim 2, wherein, when pre-aligning an input image pair with the image pre-alignment submodel:
inputting the input image pair into the depth homography estimation module to obtain a reference image I in the input image pair 1 And target image I 2 Corresponding output characteristic diagram F i R and Fi T
Based on the output characteristic diagram F i R and Fi T Obtaining a homography matrix by using a direct linear transformation method;
reference image I is respectively processed by using a spatial deformation network 1 And target image I 2 Deforming to realize reference image I 1 And target image I 2 Pre-alignment of pixel locations of overlapping regions, wherein the pre-aligned input image pair is represented as:
wherein E represents an identity matrix, H represents a homography matrix, W STN (. Cndot. Cndot.) represents the output of the spatially deformed network.
6. The method of claim 2, wherein, when aligning a pre-aligned input image pair using the image alignment sub-model:
obtaining a basic feature map of the pre-aligned reference image and the target image by using a convolution layer in the edge auxiliary network;
extracting and obtaining edge feature images of the pre-aligned reference image and the target image by using the edge auxiliary network;
respectively cascading the obtained edge feature images of the pre-aligned reference image and the target image with corresponding basic feature images to obtain a fusion feature image of the pre-aligned reference image and the target image;
calculating to obtain feature flows of the pre-aligned reference image and the target image by using a context correlation method based on the fusion feature map of the pre-aligned reference image and the target image;
the pre-aligned reference image and the target image and the corresponding characteristic flow thereof are sent into a depth grid deformation network to obtain an aligned reference imageAnd target image->Wherein the aligned reference pictures +.>And target image->Expressed as:
wherein
wherein ,F1conv and F2conv Respectively representing the basic feature maps of the pre-aligned image pairs, F 1edge and F2edge Edge feature maps, F, respectively representing pairs of pre-aligned images 1c and F2c Fusion feature maps respectively representing pre-aligned image pairs, [ ·, · ]]Representing tandem operation, CCL (·, ·) representing context dependent method, W mesh (. Cndot.). Cndot.represents a deep mesh morphing network,representing pre-aligned reference pictures, +.>Representing a pre-aligned target image.
7. The method of claim 1, wherein the overall loss function comprises a content consistency loss and a seam smoothness loss, the overall loss function L All Expressed as:
L All =αL cont +βL seam
wherein ,Lcont Representing content consistency loss, L seam Representing the joint smoothness loss, α and β are weights for the content consistency loss and the joint smoothness loss, respectively.
8. The method of claim 7, wherein the content consistency penalty consists of a photometric penalty term and a structural penalty term.
9. The method of claim 8, wherein the luminosity loss term L photo Expressed as:
L photo =||I F -I G || 1
wherein ,IF and IG Respectively representing a final image splicing result and a true value 1 Represents an L1 norm;
the structure loss term L struc Expressed as:
wherein ,representing conv1 in VGG-16 networks i Is a function of (a) and (b), I.I 2 Representing the L2 norm.
10. The method of claim 7, wherein the seam smoothness loss L seam Expressed as:
L seam =||E 1 -E 1G || 1 +||E 2 -E 2G || 1
wherein
wherein ,E1 and E2 Is an edge image of the aligned image pair, E 1G and E2G Is the true value of the edge image of the aligned image pair obtained by using the curvature formula, E net (·) represents the output of the edge assist network,representation pairA reference image of the same order,/>Representing aligned target images, m and n representing horizontal and vertical directions, +.>And div (·) represent the gradient and divergence operations, respectively.
CN202310517330.7A 2023-05-09 2023-05-09 Image stitching method based on multi-stage alignment network Pending CN116596815A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310517330.7A CN116596815A (en) 2023-05-09 2023-05-09 Image stitching method based on multi-stage alignment network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310517330.7A CN116596815A (en) 2023-05-09 2023-05-09 Image stitching method based on multi-stage alignment network

Publications (1)

Publication Number Publication Date
CN116596815A true CN116596815A (en) 2023-08-15

Family

ID=87594825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310517330.7A Pending CN116596815A (en) 2023-05-09 2023-05-09 Image stitching method based on multi-stage alignment network

Country Status (1)

Country Link
CN (1) CN116596815A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710207A (en) * 2024-02-05 2024-03-15 天津师范大学 Image stitching method based on progressive alignment and interweaving fusion network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710207A (en) * 2024-02-05 2024-03-15 天津师范大学 Image stitching method based on progressive alignment and interweaving fusion network
CN117710207B (en) * 2024-02-05 2024-07-12 天津师范大学 Image stitching method based on progressive alignment and interweaving fusion network

Similar Documents

Publication Publication Date Title
CN110570371B (en) Image defogging method based on multi-scale residual error learning
CN105761233A (en) FPGA-based real-time panoramic image mosaic method
CN107734268A (en) A kind of structure-preserved wide baseline video joining method
CN108171735B (en) Billion pixel video alignment method and system based on deep learning
CN111626927B (en) Binocular image super-resolution method, system and device adopting parallax constraint
CN112288628B (en) Aerial image splicing acceleration method and system based on optical flow tracking and frame extraction mapping
CN106910208A (en) A kind of scene image joining method that there is moving target
CN105488777A (en) System and method for generating panoramic picture in real time based on moving foreground
CN116596815A (en) Image stitching method based on multi-stage alignment network
CN106846249A (en) A kind of panoramic video joining method
CN114820408A (en) Infrared and visible light image fusion method based on self-attention and convolutional neural network
CN105069749A (en) Splicing method for tire mold images
CN111654621B (en) Dual-focus camera continuous digital zooming method based on convolutional neural network model
CN110838086A (en) Outdoor image splicing method based on correlation template matching
CN116152068A (en) Splicing method for solar panel images
CN103793891A (en) Low-complexity panorama image joint method
CN107330856B (en) Panoramic imaging method based on projective transformation and thin plate spline
CN113112404A (en) Image splicing method and device based on sliding window
CN117173012A (en) Unsupervised multi-view image generation method, device, equipment and storage medium
Lai et al. Hyperspectral Image Super Resolution With Real Unaligned RGB Guidance
CN111047513A (en) Robust image alignment method and device for cylindrical panoramic stitching
WO2022247394A1 (en) Image splicing method and apparatus, and storage medium and electronic device
CN115578260A (en) Attention method and system for direction decoupling for image super-resolution
CN113450394B (en) Different-size image registration method based on Siamese network
CN115249206A (en) Image super-resolution reconstruction method of lightweight attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination