CN116596815A - Image stitching method based on multi-stage alignment network - Google Patents
Image stitching method based on multi-stage alignment network Download PDFInfo
- Publication number
- CN116596815A CN116596815A CN202310517330.7A CN202310517330A CN116596815A CN 116596815 A CN116596815 A CN 116596815A CN 202310517330 A CN202310517330 A CN 202310517330A CN 116596815 A CN116596815 A CN 116596815A
- Authority
- CN
- China
- Prior art keywords
- image
- aligned
- representing
- target image
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000010586 diagram Methods 0.000 claims description 13
- 230000004927 fusion Effects 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 11
- 238000011176 pooling Methods 0.000 claims description 6
- 238000011426 transformation method Methods 0.000 claims description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 3
- 230000001419 dependent effect Effects 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 238000005457 optimization Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 12
- 238000013135 deep learning Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20172—Image enhancement details
- G06T2207/20192—Edge enhancement; Edge preservation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an image stitching method based on a multi-stage alignment network, which comprises the following steps: step S1, acquiring a training data set, wherein the training data set comprises a plurality of input image pairs and image stitching results corresponding to each input image pair, and the input image pairs comprise a reference image I 1 And target image I 2 The method comprises the steps of carrying out a first treatment on the surface of the S2, constructing an image stitching depth model; step S3, training the image stitching depth model based on the training data set and the overall loss function to obtain a target image stitching depth model; and S4, splicing the images to be spliced by using the target image splicing depth model to obtain an image splicing result. The invention utilizes multi-stage alignment to carry out depth optimization deformation on the image, reduces the distortion of the image content and simultaneously protects the image contentAnd the holding seam is smooth, so that the accurate splicing of the images is realized.
Description
Technical Field
The invention relates to the field of image processing and deep learning, in particular to an image stitching method based on a multi-stage alignment network.
Background
As a key technique for obtaining a high-resolution wide-field panoramic image, image stitching aims at obtaining a plurality of images with overlapping areas by rotating a camera, and stitching the images by feature matching and image fusion. However, when the rotation angle of the image capturing device is large or the photographed scenes are not coplanar, visual artifacts and misalignment of the stitched images may occur. Thus, how to ensure accurate alignment and natural smoothing of wide field-of-view panoramic images is a challenging problem in image stitching.
In recent years, a large number of image stitching methods have been proposed by researchers. The conventional image stitching method is classified into a global alignment method and a spatial variation deformation method. The global alignment method utilizes the invariant local feature matching image and establishes a mapping relation alignment image through a homography matrix, for example: a dual-homography estimation method, a smooth variation affine method, and the like. The spatial variation deformation method is to divide an image into uniform grids, and obtain optimal grid coordinates by optimizing a content-based grid deformation function, and includes a projection method as much as possible, a natural self-adaptive method as much as possible, and the like. In recent years, researchers have proposed image stitching methods based on deep learning to improve stitching performance. For example, nie et al propose an image stitching network based on global homography and eliminate image artifacts by constructing a structure stitching stage and a content modification stage. In view of the importance of edge preservation, dai et al propose an edge-guided fusion-based approach for image stitching. Jong et al devised a depth image squaring solution for preserving the linear and nonlinear structure of the image. However, the performance of these image stitching methods still needs to be further improved.
In carrying out the invention, the inventors have found that at least the following drawbacks and deficiencies in the prior art are present:
the methods in the prior art generally align images by estimating a single depth map transform, cannot effectively process large parallax scenes, and may distort the global structure of the panoramic image; the existing method ignores the importance of the image content and the splicing seams in the image splicing process, and easily causes the problems of inconsistent image content and discontinuous splicing seams.
Disclosure of Invention
The invention designs an image splicing method based on a multi-stage alignment network. The invention uses a depth homography estimation module based on content retention to pre-align an input image pair and reduce content artifacts, uses a grid deformation module based on edge assistance to further align the image pair and avoid joint distortion, constructs content consistency loss and joint smoothness loss to maintain the geometric structure of the image pair and reduce joint discontinuity of an overlapping area, and further predicts high-quality image splicing results. The image splicing method realizes high-quality splicing of images, avoids content artifacts and reduces joint distortion.
The image stitching method based on the multi-stage alignment network provided by the invention comprises the following steps:
step S1, acquiring a training data set, wherein the training data set comprises a plurality of input image pairs and image stitching results corresponding to each input image pair, and the input image pairs comprise a reference image I 1 And target image I 2 ;
S2, constructing an image stitching depth model;
step S3, training the image stitching depth model based on the training data set and the overall loss function to obtain a target image stitching depth model;
and S4, splicing the images to be spliced by using the target image splicing depth model to obtain an image splicing result.
Optionally, the image stitching depth model includes an image pre-alignment sub-model for pre-aligning the input image pair using a content-preserving based depth homography estimation module and an image alignment sub-model for further aligning the pre-aligned input image pair using an edge-aided network.
Optionally, the depth homography estimation module is formed by connecting a plurality of symmetrical convolution layer units and a corresponding number of content-holding attention modules in a staggered manner, wherein the symmetrical convolution layer units comprise two convolution layers and one maximum pooling layer; the content holding attention module comprises a spatial attention module and a plurality of cross operation modules, wherein the spatial attention module comprises two maximum pooling layers, two average pooling layers, a shared full connection layer and an activation function layer.
Optionally, the edge-assist network includes a convolutional layer, three multi-scale residual blocks, an upsampling layer, and a bottleneck layer.
Optionally, when the input image pair is pre-aligned using the image pre-alignment sub-model:
inputting the input image pair into the depth homography estimation module to obtain a reference image I in the input image pair 1 And target image I 2 Corresponding output characteristic diagram F i R and Fi T ;
Based on the output characteristic diagram F i R and Fi T Obtaining a homography matrix by using a direct linear transformation method;
reference image I is respectively processed by using a spatial deformation network 1 And target image I 2 Deforming to realize reference image I 1 And target image I 2 Pre-alignment of pixel locations of overlapping regions, wherein the pre-aligned input image pair is represented as:
wherein E represents an identity matrix, H represents a homography matrix, W STN (. Cndot. Cndot.) represents the output of the spatially deformed network.
Optionally, when aligning the pair of pre-aligned input images using the image alignment submodel:
obtaining a basic feature map of the pre-aligned reference image and the target image by using a convolution layer in the edge auxiliary network;
extracting and obtaining edge feature images of the pre-aligned reference image and the target image by using the edge auxiliary network;
respectively cascading the obtained edge feature images of the pre-aligned reference image and the target image with corresponding basic feature images to obtain a fusion feature image of the pre-aligned reference image and the target image;
calculating to obtain feature flows of the pre-aligned reference image and the target image by using a context correlation method based on the fusion feature map of the pre-aligned reference image and the target image;
the pre-aligned reference image and the target image and the corresponding characteristic flow thereof are sent into a depth grid deformation network to obtain an aligned reference imageAnd target image->Wherein the aligned reference pictures +.>And target image->Expressed as:
wherein
wherein ,F1conv and F2conv Respectively representing the basic feature maps of the pre-aligned image pairs, F 1edge and F2edge Respectively represent pre-formsAligning edge feature maps of image pairs, F 1c and F2c Fusion feature maps respectively representing pre-aligned image pairs, [ ·, · ]]Representing tandem operation, CCL (·, ·) representing context dependent method, W mesh (. Cndot.). Cndot.represents a deep mesh morphing network,representing pre-aligned reference pictures, +.>Representing a pre-aligned target image.
Optionally, the overall loss function includes a content consistency loss and a seam smoothness loss, the overall loss function L All Expressed as:
L All =αL cont +βL seam
wherein ,Lcont Representing content consistency loss, L seam Representing the joint smoothness loss, α and β are weights for the content consistency loss and the joint smoothness loss, respectively.
Optionally, the content consistency loss consists of a photometric loss term and a structural loss term.
Optionally, the luminosity loss term L photo Expressed as:
L photo =||I F -I G || 1
wherein ,IF and IG Respectively representing a final image splicing result and a true value 1 Represents an L1 norm;
the structure loss term L struc Expressed as:
wherein ,representing conv1 in VGG-16 networks i Is a function of (a) and (b), I.I 2 Representing the L2 norm.
Optionally, the seam smoothness loss L seam Expressed as:
L seam =||E 1 -E 1G || 1 +||E 2 -E 2G || 1
wherein
wherein ,E1 and E2 Is an edge image of the aligned image pair, E 1G and E2G Is the true value of the edge image of the aligned image pair obtained by using the curvature formula, E net (·) represents the output of the edge assist network,reference image representing alignment, +.>Representing aligned target images, m and n representing horizontal and vertical directions, +.>And div (·) represent the gradient and divergence operations, respectively.
The technical scheme provided by the invention has the beneficial effects that:
1. the invention can accurately align the images, reduce the distortion of the image content and simultaneously keep the joints smooth, and obtain the high-quality image splicing result.
2. The invention solves the problem of image stitching by using a deep learning technology, reduces image alignment artifacts by using a multi-stage alignment mode, and reduces seam discontinuity by using edge information assistance and seam smoothness loss.
Drawings
FIG. 1 is a flow chart of a method for image stitching based on a multi-stage alignment network according to an embodiment of the present invention;
FIG. 2 (a) is a block diagram according to the present inventionA schematic diagram of a content-holding attention module structure of an embodiment of the invention, wherein,representing pixel level multiplication;
fig. 2 (b) is a schematic view of a spatial attention module structure according to an embodiment of the present invention, wherein,representing pixel level addition, +.>Representing a sigmoid function;
FIG. 3 is a schematic diagram of an edge assist network architecture according to an embodiment of the invention;
fig. 4 is a schematic diagram showing structural similarity comparison results of different image stitching methods according to an embodiment of the present invention.
Detailed Description
The objects, technical solutions and advantages of the present invention will become more apparent by the following detailed description of the present invention with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
Fig. 1 is a flowchart of an image stitching method based on a multi-stage alignment network according to an embodiment of the present invention, and some specific implementation procedures of the present invention are described below by taking fig. 1 as an example. The image stitching method based on the multi-stage alignment network provided by the invention comprises the following steps:
step S1, acquiring a training data set, wherein the training data set comprises a plurality of input image pairs and image stitching results corresponding to each input image pair, and the input image pairs comprise a reference image I 1 And target image I 2 ;
S2, constructing an image stitching depth model;
in an embodiment of the present invention, the image stitching depth model includes an image pre-alignment sub-model and an image alignment sub-model.
The image pre-alignment sub-model is used for pre-aligning the input image pair by utilizing a depth homography estimation module based on content preservation so as to reduce content artifacts and obtain a pre-aligned input image pair.
Further, the depth homography estimation module is formed by connecting a plurality of symmetrical convolution layer units and a corresponding number of content-holding attention modules in a staggered manner, namely, the content-holding attention modules are arranged between every two symmetrical convolution layer units so as to find correct matching features and reduce wrong matching features, as shown in fig. 2 (a). Wherein the symmetrical convolution layer unit comprises two convolution layers and a Maxpooling layer; the content holding attention module comprises a spatial attention module and a plurality of cross operation modules, as shown in fig. 2 (a); the spatial attention module includes two Maxpooling layers, two Avgpooling layers, one shared full connection layer and one sigmoid layer, as shown in fig. 2 (b).
Assuming that the depth homography estimation module includes i+1 symmetric convolution layer units and i+1 content-holding attention modules, for the depth homography estimation module, the reference image I 1 And target image I 2 After being sent to a first-stage symmetrical convolution layer unit in the depth homography estimation module, a reference image I can be generated 1 And target image I 2 Is a primary feature map of (1) and />After passing through the content holding attention module between the first-stage symmetrical convolution layer unit and the second-stage symmetrical convolution layer unit, the reference image I can be obtained 1 And target image I 2 Corresponding first-level weighted feature diagram F 0 R and F0 T The method comprises the steps of carrying out a first treatment on the surface of the Warp yarnAfter passing through the second-stage symmetrical convolution layer unit, a second-stage characteristic diagram can be obtained> and />After passing through the content holding attention module between the second-level symmetrical convolution layer unit and the third-level symmetrical convolution layer unit, the reference image I can be obtained 1 And target image I 2 Corresponding two-level weighted feature map F 1 R and F1 T The method comprises the steps of carrying out a first treatment on the surface of the And so on, after passing through the (i+1) th content-holding attention module connected with the (i+1) th symmetrical convolution layer unit of the last stage, the (i+1) th symmetrical convolution layer unit, the (i+1) th content-holding attention module can obtain the reference image I 1 And target image I 2 Corresponding output characteristic diagram F i R and Fi T As shown in FIG. 2 (a), the output characteristic diagram F i R and Fi T Can be expressed as:
wherein
wherein , and />Level i+1 feature map representing reference image and target image, respectively,/for each of the images> and />Respectively representing the spatial level feature map of the reference image and the target image obtained by multiplying the i+1st level feature map and the corresponding spatial attention mask, namely the output of the corresponding spatial attention module, pixel by pixel, M s (. Cndot.) represents the spatial attention mask, (. Cndot.)>Representing pixel level multiplication.
When the image pre-alignment submodel is utilized to pre-align an input image pair:
firstly, inputting the input image pair into the depth homography estimation module to obtain a reference image I in the input image pair 1 And target image I 2 Corresponding output characteristic diagram F i R and Fi T ;
Then, based on the output feature map F i R and Fi T The homography matrix is obtained by using a direct linear transformation method, wherein the homography matrix is obtained by using the direct linear transformation method, which belongs to the technology which should be mastered by the person skilled in the art, and the invention is not repeated;
then, the reference images I are respectively processed by using a spatial deformation network 1 And target image I 2 Performing deformation, i.e. realizing reference image I 1 And target image I 2 Pre-alignment of pixel locations of overlapping regions, wherein the pre-aligned input image pair may be represented as:
wherein E represents an identity matrix, H represents a homography matrix, W STN (. Cndot. Cndot.) represents the output of the spatially deformed network.
Wherein the image alignment sub-model is configured to further align the pre-aligned input image pair using an edge-aided network to reduce joint distortion.
Further, the edge-assisted network is an edge-based mesh morphing module that includes a convolutional layer, three multi-scale residual blocks, an upsampling layer, and a bottleneck layer, as shown in fig. 3.
When the image alignment submodel is utilized to align the pre-aligned input image pair:
firstly, a basic feature map of a prealigned reference image and a target image is obtained by utilizing a convolution layer in the edge auxiliary network;
then, extracting and obtaining an edge feature map of the pre-aligned reference image and the target image by using the edge auxiliary network;
then, the obtained edge feature images of the pre-aligned reference image and the target image are respectively cascaded with corresponding basic feature images to obtain a fusion feature image of the pre-aligned reference image and the target image;
then, calculating to obtain feature flows of the pre-aligned reference image and the target image by using a context correlation method based on the fusion feature map of the pre-aligned reference image and the target image;
finally, the prealigned reference image, the target image and the corresponding characteristic streams are sent into a depth grid deformation network to obtain an aligned reference imageAnd target image->Wherein the aligned reference pictures +.>And target image->Can be expressed as:
wherein
wherein ,F1conv and F2conv Respectively representing the basic feature maps of the pre-aligned image pairs, F 1edge and F2edge Edge feature maps, F, respectively representing pairs of pre-aligned images 1c and F2c Fusion feature maps respectively representing pre-aligned image pairs, [ ·, · ]]Representing tandem operation, CCL (·, ·) representing context dependent method, W mesh (. Cndot.) represents the output of the deep mesh morphing network,representing pre-aligned reference pictures, +.>Representing a pre-aligned target image.
Step S3: training the image stitching depth model based on the training data set and the overall loss function to obtain a target image stitching depth model;
in one embodiment of the invention, the overall loss function includes a content consistency loss and a seam smoothness loss to preserve the geometry of the image pair and reduce seam discontinuities in the overlapping region using the content consistency loss and the seam smoothness loss.
Wherein the overall loss function L All Can be expressed as:
L All =αL cont +βL seam
wherein ,Lcont Representing content consistency loss, L seam Representing the joint smoothness loss, α and β are weights for the content consistency loss and the joint smoothness loss, respectively.
In an embodiment of the present invention, the weights α and β may each be set to 0.5.
Wherein the content consistency loss consists of a photometric loss term for minimizing pixel differences between the image stitching result and the true value and a structural loss term for constraining the image stitching result and the true value to have similar characteristic representations;
further, the luminosity loss term L photo Can be expressed as:
L photo =||I F -I G || 1
wherein ,IF and IG Respectively representing a final image splicing result and a true value 1 Represents an L1 norm;
the structure loss term L struc Can be expressed as:
wherein ,representing conv1 in VGG-16 networks i Is a function obtainable by a person skilled in the art, conv1 1 and conv12 The receptive field of each pixel in (1) is a 5×5 neighborhood, |·|| 2 Representing the L2 norm.
Wherein the seam smoothness loss is used to constrain the edge image of the aligned image pair to be close to the edge image realism value of the aligned image pair.
Further, the joint smoothness loss L seam Can be expressed as:
L seam =||E 1 -E 1G || 1 +||E 2 -E 2G || 1
wherein
wherein ,E1 and E2 Is an edge image of the aligned image pair, E 1G and E2G Is the true value of the edge image of the aligned image pair obtained by using the curvature formula, E net (. Cndot.) represents the output of the edge assist network, m and n represent the horizontal and vertical directions,and div (·) represent the gradient and divergence operations, respectively.
And S4, splicing the images to be spliced by using the target image splicing depth model to obtain a high-quality image splicing result.
Fig. 4 shows structural similarity comparison results of image stitching results obtained by different methods, and the comparison algorithm includes: the method of Zaragaza and the method of Zhao, wherein the method of Zaragaza is a traditional image stitching method and the method of Zhao is an image stitching method based on deep learning. The greater the structural similarity, the higher the quality of the image stitching result. As can be seen from fig. 4, the structural similarity of the present invention is greater than that of the zaagaoza method, illustrating the important role of the content-preserving-based depth homography model in image stitching. In addition, the Zhao method also performs worse than the present invention in terms of structural similarity. The main reason is that the Zhao method uses only a single depth homography for image alignment, which can produce undesirable alignment distortion, and thus lead to seam discontinuities. In contrast, the invention reduces the content artifacts and joint distortions of the image stitching results by constructing a multi-stage alignment model and combining the content consistency loss and joint smoothness loss.
The embodiment of the invention does not limit the types of other devices except the types of the devices, so long as the devices can complete the functions.
Those skilled in the art will appreciate that the drawings are schematic representations of only one preferred embodiment, and that the above-described embodiment numbers are merely for illustration purposes and do not represent advantages or disadvantages of the embodiments.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.
Claims (10)
1. An image stitching method based on a multi-stage alignment network, comprising the steps of:
step S1, acquiring a training data set, wherein the training data set comprises a plurality of input image pairs and image stitching results corresponding to each input image pair, and the input image pairs comprise a reference image I 1 And target image I 2 ;
S2, constructing an image stitching depth model;
step S3, training the image stitching depth model based on the training data set and the overall loss function to obtain a target image stitching depth model;
and S4, splicing the images to be spliced by using the target image splicing depth model to obtain an image splicing result.
2. The method of claim 1, wherein the image stitching depth model comprises an image pre-alignment sub-model for pre-aligning the input image pair using a content-preserving based depth homography estimation module and an image alignment sub-model for further aligning the pre-aligned input image pair using an edge-aided network.
3. The method of claim 2, wherein the depth homography estimation module is formed by interleaving a plurality of symmetric convolution layer units and a corresponding number of content-holding attention modules, wherein the symmetric convolution layer units include two convolution layers and a maximum pooling layer; the content holding attention module comprises a spatial attention module and a plurality of cross operation modules, wherein the spatial attention module comprises two maximum pooling layers, two average pooling layers, a shared full connection layer and an activation function layer.
4. The method of claim 2, wherein the edge-assisted network comprises a convolutional layer, three multi-scale residual blocks, an upsampling layer, and a bottleneck layer.
5. The method of claim 2, wherein, when pre-aligning an input image pair with the image pre-alignment submodel:
inputting the input image pair into the depth homography estimation module to obtain a reference image I in the input image pair 1 And target image I 2 Corresponding output characteristic diagram F i R and Fi T ;
Based on the output characteristic diagram F i R and Fi T Obtaining a homography matrix by using a direct linear transformation method;
reference image I is respectively processed by using a spatial deformation network 1 And target image I 2 Deforming to realize reference image I 1 And target image I 2 Pre-alignment of pixel locations of overlapping regions, wherein the pre-aligned input image pair is represented as:
wherein E represents an identity matrix, H represents a homography matrix, W STN (. Cndot. Cndot.) represents the output of the spatially deformed network.
6. The method of claim 2, wherein, when aligning a pre-aligned input image pair using the image alignment sub-model:
obtaining a basic feature map of the pre-aligned reference image and the target image by using a convolution layer in the edge auxiliary network;
extracting and obtaining edge feature images of the pre-aligned reference image and the target image by using the edge auxiliary network;
respectively cascading the obtained edge feature images of the pre-aligned reference image and the target image with corresponding basic feature images to obtain a fusion feature image of the pre-aligned reference image and the target image;
calculating to obtain feature flows of the pre-aligned reference image and the target image by using a context correlation method based on the fusion feature map of the pre-aligned reference image and the target image;
the pre-aligned reference image and the target image and the corresponding characteristic flow thereof are sent into a depth grid deformation network to obtain an aligned reference imageAnd target image->Wherein the aligned reference pictures +.>And target image->Expressed as:
wherein
wherein ,F1conv and F2conv Respectively representing the basic feature maps of the pre-aligned image pairs, F 1edge and F2edge Edge feature maps, F, respectively representing pairs of pre-aligned images 1c and F2c Fusion feature maps respectively representing pre-aligned image pairs, [ ·, · ]]Representing tandem operation, CCL (·, ·) representing context dependent method, W mesh (. Cndot.). Cndot.represents a deep mesh morphing network,representing pre-aligned reference pictures, +.>Representing a pre-aligned target image.
7. The method of claim 1, wherein the overall loss function comprises a content consistency loss and a seam smoothness loss, the overall loss function L All Expressed as:
L All =αL cont +βL seam
wherein ,Lcont Representing content consistency loss, L seam Representing the joint smoothness loss, α and β are weights for the content consistency loss and the joint smoothness loss, respectively.
8. The method of claim 7, wherein the content consistency penalty consists of a photometric penalty term and a structural penalty term.
9. The method of claim 8, wherein the luminosity loss term L photo Expressed as:
L photo =||I F -I G || 1
wherein ,IF and IG Respectively representing a final image splicing result and a true value 1 Represents an L1 norm;
the structure loss term L struc Expressed as:
wherein ,representing conv1 in VGG-16 networks i Is a function of (a) and (b), I.I 2 Representing the L2 norm.
10. The method of claim 7, wherein the seam smoothness loss L seam Expressed as:
L seam =||E 1 -E 1G || 1 +||E 2 -E 2G || 1
wherein
wherein ,E1 and E2 Is an edge image of the aligned image pair, E 1G and E2G Is the true value of the edge image of the aligned image pair obtained by using the curvature formula, E net (·) represents the output of the edge assist network,representation pairA reference image of the same order,/>Representing aligned target images, m and n representing horizontal and vertical directions, +.>And div (·) represent the gradient and divergence operations, respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310517330.7A CN116596815A (en) | 2023-05-09 | 2023-05-09 | Image stitching method based on multi-stage alignment network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310517330.7A CN116596815A (en) | 2023-05-09 | 2023-05-09 | Image stitching method based on multi-stage alignment network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116596815A true CN116596815A (en) | 2023-08-15 |
Family
ID=87594825
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310517330.7A Pending CN116596815A (en) | 2023-05-09 | 2023-05-09 | Image stitching method based on multi-stage alignment network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116596815A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117710207A (en) * | 2024-02-05 | 2024-03-15 | 天津师范大学 | Image stitching method based on progressive alignment and interweaving fusion network |
-
2023
- 2023-05-09 CN CN202310517330.7A patent/CN116596815A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117710207A (en) * | 2024-02-05 | 2024-03-15 | 天津师范大学 | Image stitching method based on progressive alignment and interweaving fusion network |
CN117710207B (en) * | 2024-02-05 | 2024-07-12 | 天津师范大学 | Image stitching method based on progressive alignment and interweaving fusion network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110570371B (en) | Image defogging method based on multi-scale residual error learning | |
CN105761233A (en) | FPGA-based real-time panoramic image mosaic method | |
CN107734268A (en) | A kind of structure-preserved wide baseline video joining method | |
CN108171735B (en) | Billion pixel video alignment method and system based on deep learning | |
CN111626927B (en) | Binocular image super-resolution method, system and device adopting parallax constraint | |
CN112288628B (en) | Aerial image splicing acceleration method and system based on optical flow tracking and frame extraction mapping | |
CN106910208A (en) | A kind of scene image joining method that there is moving target | |
CN105488777A (en) | System and method for generating panoramic picture in real time based on moving foreground | |
CN116596815A (en) | Image stitching method based on multi-stage alignment network | |
CN106846249A (en) | A kind of panoramic video joining method | |
CN114820408A (en) | Infrared and visible light image fusion method based on self-attention and convolutional neural network | |
CN105069749A (en) | Splicing method for tire mold images | |
CN111654621B (en) | Dual-focus camera continuous digital zooming method based on convolutional neural network model | |
CN110838086A (en) | Outdoor image splicing method based on correlation template matching | |
CN116152068A (en) | Splicing method for solar panel images | |
CN103793891A (en) | Low-complexity panorama image joint method | |
CN107330856B (en) | Panoramic imaging method based on projective transformation and thin plate spline | |
CN113112404A (en) | Image splicing method and device based on sliding window | |
CN117173012A (en) | Unsupervised multi-view image generation method, device, equipment and storage medium | |
Lai et al. | Hyperspectral Image Super Resolution With Real Unaligned RGB Guidance | |
CN111047513A (en) | Robust image alignment method and device for cylindrical panoramic stitching | |
WO2022247394A1 (en) | Image splicing method and apparatus, and storage medium and electronic device | |
CN115578260A (en) | Attention method and system for direction decoupling for image super-resolution | |
CN113450394B (en) | Different-size image registration method based on Siamese network | |
CN115249206A (en) | Image super-resolution reconstruction method of lightweight attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |