CN117710215A - Binocular image super-resolution method based on polar line windowing attention - Google Patents

Binocular image super-resolution method based on polar line windowing attention Download PDF

Info

Publication number
CN117710215A
CN117710215A CN202410029544.4A CN202410029544A CN117710215A CN 117710215 A CN117710215 A CN 117710215A CN 202410029544 A CN202410029544 A CN 202410029544A CN 117710215 A CN117710215 A CN 117710215A
Authority
CN
China
Prior art keywords
resolution
features
image
attention
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410029544.4A
Other languages
Chinese (zh)
Other versions
CN117710215B (en
Inventor
张红英
李雪
黄孝茹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202410029544.4A priority Critical patent/CN117710215B/en
Publication of CN117710215A publication Critical patent/CN117710215A/en
Application granted granted Critical
Publication of CN117710215B publication Critical patent/CN117710215B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Processing (AREA)

Abstract

The invention provides a binocular image super-resolution method based on polar line windowing attention. Firstly, in order to solve the problems of image perception and local information attention, a mixed feature extractor is designed, which consists of a sliding window multi-head self-attention module and a feature distillation and enhancement module, and the combination can effectively capture the high-frequency features of the image with more discrimination; then, aiming at the problem of offset of complementary pixels by the binocular image, designing a epipolar window attention, dividing the window along the epipolar direction to promote the matching of the shifted pixels, and even in a pixel smoothing area, utilizing adjacent pixels in the window as references to realize more accurate pixel matching; and finally, designing a residual polar line window attention mechanism, fusing the fused parallax images again, and simultaneously reconstructing a super-resolution binocular image through sub-pixel convolution. The invention realizes excellent super-resolution performance with less parameter quantity, recovers the characteristic of more discriminative image, and has better robustness.

Description

Binocular image super-resolution method based on polar line windowing attention
Technical Field
The invention relates to an image processing technology, in particular to a binocular image super-resolution method based on polar line windowing attention.
Background
The binocular image super-resolution reconstruction based on the deep learning is an important research direction in the field of computer vision, and the core idea is to infer high-resolution image information by utilizing information in the double-view image. Improving the quality of low-resolution binocular images is a long-term concern in the field of computer vision, and early binocular image super-resolution methods utilize binocular cameras and other devices to acquire two low-resolution images, and then match and fuse the images to obtain high-resolution images with different visual angles. The role of binocular image super-resolution reconstruction in 3D vision is getting more attention in various subjects, and in the image recovery and enhancement new trend super-resolution racetrack, the task of binocular image super-resolution reconstruction shows excellent results, which proves that binocular corresponding information contained in binocular image pairs can be used for improving the image super-resolution reconstruction performance.
However, the low-resolution binocular image has the influence of high-frequency information deletion and large parallax change, so that the fusion of binocular corresponding information in the super-resolution reconstruction of the binocular image is very challenging. Specifically, the task of binocular image super-resolution reconstruction has the following problems: first, the high frequency characteristics are restored. In binocular image super-resolution reconstruction, it is necessary to recover a high-resolution depth map from low-resolution left and right views. This task involves restoring image depth information, including edges and details of objects. Thus, one major challenge is how to effectively restore and enhance the high frequency characteristics of the image. And secondly, merging left and right view information. The binocular image super-resolution reconstruction requires merging of information from the left and right views to increase the accuracy of the depth estimation. This requires an efficient information fusion strategy to integrate the information of the two views together to obtain a better depth map. Third, the number of model parameters is large. In binocular image super-resolution reconstruction tasks, it is often necessary to use a deep neural network model to learn the mapping from low resolution to high resolution. Since high resolution depth images have complex structures, the training and reasoning process requires more computational resources in order to capture these features.
Aiming at the problems of the super-resolution of the binocular image, recently, NAFSSR utilizes NAFnet to extract multi-scale characteristics of the image, PASSRnet and iPASR can well fuse view information by utilizing parallax attention, and SwinFSSR utilizes aligned view complementary information with better residual cross attention. Although the prior art has been substantially advanced, the long dependence of the features in the high-frequency feature recovery and the concern of local information are not balanced, and secondly, when the parallax attention mechanism is used for fusion of complementary information, the problem of up-down offset of the complementary pixels along the polar line is not focused. Therefore, the design of the binocular image super-resolution reconstruction model which is efficient and light has important research significance.
Disclosure of Invention
The invention aims to solve the problem of high-frequency characteristic recovery and binocular complementary information fusion in binocular image super-resolution reconstruction, and provides a binocular image super-resolution method based on polar line windowing attention.
In order to achieve the above object, the present invention provides a binocular image super-resolution method based on polar line windowing attention, which mainly comprises the following seven parts: the first part is to pre-process the binocular image data set; the second part is to extract shallow layer characteristics of the left and right eye low resolution images; the third part is to extract deep features of left and right eye images after shallow features are extracted; the fourth part is to align the left and right eye feature images after deep features are extracted by adopting the line sub-window attention; the fifth part is to iterate the operations from the third part to the fourth part and fuse the output characteristics of all the fourth parts again; the sixth part is to reconstruct the left and right eye images with super resolution; the seventh part is training and testing of a binocular image super-resolution network model based on polar line windowing attention, and finally a reconstructed high-resolution left and right view is obtained, specifically:
the first part comprises two steps:
step 1, downloading binocular public data sets Flickr1024 and Middlebury, selecting 860 groups of binocular image pairs as high-resolution image samples in a training set, and then performing bicubic interpolation downsampling on the binocular high-resolution image pairs to obtain low-resolution image samples in the training set;
step 2, cutting high-resolution and low-resolution image samples into image blocks in one-to-one correspondence, wherein the size of the low-resolution sample is 30 multiplied by 90, the size of the four-time amplified high-resolution sample is 120 multiplied by 360, and performing rotation, translation, shielding and channel sequential transformation operations on the cut high-resolution and low-resolution image blocks to strengthen a training set sample and avoid overfitting to form a final training set sample;
the second part comprises a step of:
step 3, the low-resolution training set sample in the step 2 is processed by a 3 multiplied by 3 convolution layer with shared weight, so that an image is mapped from a low-dimensional space of an RGB channel to a high-dimensional space of a 94 channel, and shallow layer characteristics L1 and R1 of a left-right image are preliminarily obtained;
the third part comprises a step of:
step 4, shallow layer features L1 and R1 of the left and right images obtained in the step 3 are used as input, and residual error Swin transform feature distillation and enhancement module RSTFB is adopted to extract semantic features rich in images, so that deep layer features L2.1 and R2.1 of the images are obtained;
the fourth part comprises a step of:
step 5, taking the deep features L2.1 and R2.1 obtained in the step 4 as input, and performing left-right eye feature alignment operation by adopting an epipolar window attention EWA to obtain aligned features L3.1 and R3.1;
the fifth part comprises two steps:
step 6, taking the output of the step 5 as the input of the step 4, sequentially iterating the step 4 and the step 5 for 7 times to obtain intermediate characteristics L2.N and R2.N output by the step 4 and intermediate characteristics L3.N and R3.N output by the step 5, wherein n=2, 3,4,5,6,7 and 8;
step 7, the binocular features L3.1 and R3.1 aligned in the step 5 and the intermediate features L3.N and R3.N output by the step 5 after the step 6 are subjected to a residual polar line window attention REWA re-fusion operation to obtain fused feature images L4 and R4, and the expression capacity of the parallax image features is enhanced;
the sixth part includes a step of:
step 8, mapping the fused feature images L4 and R4 in the step 7 to RGB space by utilizing a sub-pixel deconvolution layer to obtain feature images L5 and R5;
step 9, performing double three times of upsampling on the input low-resolution left view and right view to obtain upsampling feature images L6 and R6, performing residual error operation on L6 and L5 in the step 8 in a matrix addition mode to reconstruct a high-resolution left view, and simultaneously performing residual error operation on R6 and R5 in the step 8 in a matrix addition mode to reconstruct a high-resolution right view;
the seventh part comprises two steps:
step 10, inputting the training set samples in the step 2 into the network from the step 3 to the step 9, and setting network super parameters: the learning rate is 2e-4, epochs is 60, batch size is 8, the optimizer is Adam, the loss function is MSE, and the training network is used for obtaining a final binocular image super-resolution pre-training model;
and 11, inputting the public test set and the actually existing low-resolution binocular image into the pre-training model obtained in the step 10, and simultaneously reconstructing the super-resolution binocular image by the network.
The invention provides a binocular image super-resolution method based on polar line windowing attention. Firstly, in the aspect of feature extraction, in order to solve the problems of image perception and local information attention, a mixed feature extractor RSTFB is designed to extract network multi-scale features, the RSTFB consists of a multi-head self-attention mechanism based on shifting and non-shifting windows and a feature distillation and enhancement module FDEB, and the combination can effectively capture the high-frequency features with more discriminant images; then, aiming at the problem of offset of complementary pixels of the binocular image, a polar line window attention mechanism EWA is designed, the EWA divides the window along the polar line direction to promote the matching efficiency of the shift pixels, and even in a pixel smooth area, the adjacent pixels in the window can be used as references to realize more accurate pixel matching; and finally, designing a residual polar line window attention mechanism REWA, fusing all the parallax images obtained after the EWA fusion, and simultaneously reconstructing a super-resolution binocular image through sub-pixel convolution. The invention utilizes a mixed feature extractor RSTFB to extract the rich semantic features of binocular images, and the RSTFB establishes a long dependency relationship between images by combining advantages based on a Transformer and a convolutional neural network; according to the invention, the binocular image alignment operation is realized by dividing the window along the polar line direction, so that the network is more focused on the high-frequency characteristics of the image, excellent super-resolution performance is realized with less parameter quantity, the more discriminative characteristics of the image are recovered, and the network has better robustness.
Drawings
FIG. 1 is a diagram of the overall network framework of the present invention;
FIG. 2 is a hybrid feature extractor of the present invention;
FIG. 3 is a polar window attention EWA of the present invention;
FIG. 4 is a residual epipolar window attention REWA of the present invention;
FIG. 5 is a low resolution binocular image;
fig. 6 is a super-resolution binocular image of fig. 5 processed by the present invention.
Detailed Description
In order to better understand the present invention, the following describes the method for reconstructing super-resolution of binocular image based on polar line windowing attention in more detail with reference to the specific embodiments. In the following description, a detailed description of the current prior art may obscure the subject matter of the present invention, which description will be omitted herein.
Step 1, downloading binocular public data sets Flickr1024 and Middlebury, selecting 860 groups of binocular image pairs as high-resolution image samples in a training set, and then performing bicubic interpolation downsampling on the binocular high-resolution image pairs to obtain low-resolution image samples in the training set;
step 2, cutting high-resolution and low-resolution image samples into image blocks in one-to-one correspondence, wherein the size of the low-resolution sample is 30 multiplied by 90, the size of the four-time amplified high-resolution sample is 120 multiplied by 360, and performing rotation, translation, shielding and channel sequential transformation operations on the cut high-resolution and low-resolution image blocks to strengthen a training set sample and avoid overfitting to form a final training set sample;
fig. 1 is a network overall frame diagram of the binocular image super-resolution reconstruction method based on polar line windowed attention of the present invention, which is performed according to the following steps in this embodiment:
step 3, the low-resolution training set sample in the step 2 is processed by a 3 multiplied by 3 convolution layer with shared weight, so that an image is mapped from a low-dimensional space of an RGB channel to a high-dimensional space of a 94 channel, and shallow layer characteristics L1 and R1 of a left-right image are preliminarily obtained;
and 4, taking shallow layer features L1 and R1 of the left and right images obtained in the step 3 as input, extracting rich semantic features of the images by adopting a residual Swin transform feature distillation and enhancement module, and obtaining deep layer features L2.1 and R2.1 of the images, wherein the implementation is as follows:
step 4.1, as shown in fig. 2 (a), the hybrid feature extractor RSTFB extracts deep features of the image through 6 Swin Transformer FDEB layers from the shallow features L1 and R1; then, a long dependency relationship among image features is established by adopting an overlapped cross attention module; finally, adopting a 3 multiplied by 3 convolution layer to aggregate the features of different depths, and simultaneously introducing residual connection to improve the expression capacity of the network;
step 4.2, STFL is composed by introducing a feature distillation and enhancement module FDEB into the Swin transducer layer, as shown in FIG. 2 (b), aiming at the problem that the transducer lacks attention to local features;
step 4.3, as shown in fig. 2 (C), the fdeb first uses the 1×1 convolution layer and the gel to enhance the feature channel C to 2C to obtain enhanced feature E1, uses the 3×3 convolution layer and the gel to perform feature distillation operation on the feature E1, distills the feature channel to C to obtain distilled feature D1, and uses the 1×1 convolution layer and the gel to re-enhance the enhanced feature E1 to obtain enhanced feature E2; then, obtaining a distillation characteristic D2 and an enhancement characteristic E3 from E2, and obtaining a distillation characteristic D3 and an enhancement characteristic E4 from E3; finally, the distillation features D1, D2, D3 and the enhancement feature E4 are polymerized by adopting a 1X 1 convolution layer, and the important feature information of the image is focused by a network from two dimensions of space and channel through space attention and channel attention, residual connection is introduced in the whole operation, and more discriminative image features are output;
and 5, taking the deep features L2.1 and R2.1 obtained in the step 4 as input, and performing left-right eye feature alignment operation by using the polar line window attention EWA to obtain aligned features L3.1 and R3.1, wherein the implementation is as follows:
step 5.1, as shown in fig. 3, the polar line window attention EWA divides the input deep features L2.1 and R2.1 with feature sizes of h×w×c into 6 windows along the W direction, the window sizes of h×w0×c, wherein w0=w/6, and the left and right window features X-L2 and X-R2 of the same feature region are aligned by using the cross view attention module CAM;
step 5.2, in order for the CAM to adaptively adjust the weights between characteristic channels as shown in fig. 3 (a), to introduce a channel attention convolution CAC in the parallax attention, to make the CAM effectively use the complementary information between views, the CAC is as shown in fig. 3 (b), and adds a GELU between two 3×3 convolution layers, the first convolution layer expands the number of channels from C to 2C, and the second convolution layer reduces the number of channels from 2C to C; finally, splicing all the left and right window features Y-L2 and Y-R2 after CAM alignment in the W0 dimension to obtain aligned binocular features L3.1 and R3.1;
step 6, taking the output of the step 5 as the input of the step 4, sequentially iterating the step 4 and the step 5 for 7 times to obtain intermediate characteristics L2.N and R2.N output by the step 4 and intermediate characteristics L3.N and R3.N output by the step 5, wherein n=2, 3,4,5,6,7 and 8;
step 7, the aligned binocular features L3.1 and R3.1 in the step 5 and intermediate features L3.n and R3.n output in the step 5 after the step 6 are subjected to a residual polar line window attention REWA re-fusion operation to obtain fused feature images L4 and R4, the expression capacity of the parallax image features is enhanced, the REWA is shown in the figure 4, the outputs of all the EWAs are added and accumulated by adopting a matrix to polymerize the aligned features with different depths, L3.1 and L3.n are obtained by the REWA, and R3.1 and R3.n are obtained by the REWA to obtain R4;
step 8, mapping the fused feature images L4 and R4 in the step 7 to RGB space by utilizing a sub-pixel deconvolution layer to obtain feature images L5 and R5;
step 9, performing double three times of upsampling on the input low-resolution left view and right view to obtain upsampling feature images L6 and R6, performing residual error operation on L6 and L5 in the step 8 in a matrix addition mode to reconstruct a high-resolution left view, and simultaneously performing residual error operation on R6 and R5 in the step 8 in a matrix addition mode to reconstruct a high-resolution right view;
step 10, inputting the training set samples in the step 2 into the network from the step 3 to the step 9, and setting network super parameters: the learning rate is 2e-4, epochs is 60, batch size is 8, the optimizer is Adam, the loss function is MSE, and the training network is used for obtaining a final binocular image super-resolution pre-training model;
and 11, inputting the public test set and the actually existing low-resolution binocular image into the pre-training model obtained in the step 10, and simultaneously reconstructing the super-resolution binocular image by the network.
The invention provides a binocular image super-resolution method based on polar line windowing attention, which starts from the problems of image detail feature recovery and polar line fluctuation of centered pixels of a binocular image. Firstly, the method adopts a mixed feature extractor consisting of sliding and non-sliding window multi-head attention and FDEB on feature extraction, so that the problems of global image perception and local feature attention are successfully solved; in addition, in order to solve the problem of offset of complementary pixel information along epipolar lines in a binocular correspondence, the method introduces an epipolar line window attention mechanism EWA. It partitions the window along the epipolar direction to facilitate efficient matching of shifted pixels. The method comprises the steps that a mixed feature extractor RSTFB and a feature fusion device EWA are utilized in a network to iterate for a plurality of times to generate image high-frequency features; and finally, fusing all complementary parallax attention attempts by using REWA, and simultaneously obtaining binocular super-resolution images through a reconstruction module. The algorithm is light in weight, excellent in super-resolution performance and suitable for low-resolution binocular images which are actually degenerated.
While the foregoing describes illustrative embodiments of the present invention, it should be understood that the present invention is not limited to the scope of the embodiments, but rather, it should be apparent to those skilled in the art that various changes can be made within the spirit and scope of the invention as defined and defined by the appended claims, all of which are intended to be protected by the following inventive concept.

Claims (5)

1. The binocular image super-resolution method based on the epipolar windowed attention is characterized in that the design hybrid feature extractor is used for extracting more discriminative features of an image and designing complementary features of the epipolar windowed attention fusion binocular view, and the method comprises seven parts of data set preprocessing, shallow feature extraction, deep feature extraction, feature fusion, iterative feature extraction and fusion operation, super-resolution reconstruction, network model training and testing:
the first part comprises two steps:
step 1, preparing a binocular data set, and performing bicubic interpolation downsampling on a binocular high-resolution image sample to obtain a low-resolution image sample in a training set;
step 2, cutting out high-resolution and low-resolution images, wherein the size of a low-resolution sample is 30 multiplied by 90, the size of a quadruple high-resolution sample is 120 multiplied by 360, and performing rotation, translation, shielding and channel sequential transformation operations on the cut high-resolution and low-resolution image blocks to form a final training set sample;
the second part comprises a step of:
step 3, mapping the low-resolution training set sample in the step 2 to a high-dimensional space of 94 channels through a 3X 3 convolution layer sharing weights, and preliminarily obtaining shallow layer features L1 and R1 of the left and right images;
the third part comprises a step of:
and 4, taking the shallow features L1 and R1 of the left and right images obtained in the step 3 as input, extracting rich semantic features of the image by adopting a residual Swin transform feature distillation and enhancement module RSTFB, and obtaining deep features L2.1 and R2.1 of the image, wherein the implementation is as follows:
(1) Shallow features L1 and R1 firstly extract deep features of an image through 6 Swin Transformer FDEB layers; then, a long dependency relationship among image features is established by adopting an overlapped cross attention module; finally, adopting a 3 multiplied by 3 convolution layer to aggregate the features of different depths, and simultaneously introducing residual connection to improve the expression capacity of the network;
(2) Aiming at the problem that the transducer lacks attention to local characteristics, a characteristic distillation and enhancement module FDEB is introduced into the Swin transducer layer to form STFL;
(3) In the FDEB operation, firstly, a characteristic channel C is enhanced to 2C by adopting a 1X 1 convolution layer and a GELU to obtain an enhanced characteristic E1, characteristic distillation operation is carried out on the characteristic E1 by adopting a 3X 3 convolution layer and the GELU, the characteristic channel is distilled to C to obtain a distillation characteristic D1, and the enhanced characteristic E1 is enhanced again by adopting the 1X 1 convolution layer and the GELU to obtain an enhanced characteristic E2; then, obtaining a distillation characteristic D2 and an enhancement characteristic E3 from E2, and obtaining a distillation characteristic D3 and an enhancement characteristic E4 from E3; finally, the distillation features D1, D2, D3 and the enhancement feature E4 are polymerized by adopting a 1X 1 convolution layer, and the important feature information of the image is focused by a network from two dimensions of space and channel through space attention and channel attention, residual connection is introduced in the whole operation, and more discriminative image features are output;
the fourth part comprises a step of:
and 5, taking the deep features L2.1 and R2.1 obtained in the step 4 as input, and performing left-right eye feature alignment operation by using the polar line window attention EWA to obtain aligned features L3.1 and R3.1, wherein the implementation is as follows:
(1) The EWA divides the input deep features L2.1 and R2.1 with the characteristic size of H multiplied by W multiplied by C into 6 windows along the W direction respectively, wherein the window size is H multiplied by W0 multiplied by C, W0 = W/6, and the left window feature X-L2 and the right window feature X-R2 of the same feature area are aligned by adopting a cross view attention module CAM;
(2) In order to enable the network to adaptively adjust the weight among characteristic channels, a channel attention convolution CAC is introduced in parallax attention, so that the CAM effectively utilizes complementary information among views, the CAC adds GELU between two 3X 3 convolution layers, the first convolution layer expands the number of channels from C to 2C, and the second convolution layer reduces the number of channels from 2C to C; finally, splicing all the left and right window features Y-L2 and Y-R2 after CAM alignment in the W0 dimension to obtain aligned binocular features L3.1 and R3.1;
the fifth part comprises two steps:
step 6, taking the output of the step 5 as the input of the step 4, sequentially iterating the step 4 and the step 5 for 7 times to obtain intermediate characteristics L2.N and R2.N output by the step 4 and intermediate characteristics L3.N and R3.N output by the step 5, wherein n=2, 3,4,5,6,7 and 8;
step 7, merging the aligned binocular features L3.1 and R3.1 in the step 5 and the intermediate features L3.n and R3.n output in the step 5 after the step 6 by adopting a residual polar line window attention REWA re-merging operation to obtain fused feature images L4 and R4, and enhancing the expression capacity of the parallax image features, wherein the REWA adds up the outputs of all EWAs by adopting a matrix to polymerize the aligned features with different depths, L4 is obtained by the REWA from the L3.1 and L3.n, and R4 is obtained by the REWA from the R3.1 and R3.n;
the sixth part includes a step of:
step 8, mapping the fused feature images L4 and R4 in the step 7 to RGB space by utilizing a sub-pixel deconvolution layer to obtain feature images L5 and R5;
step 9, performing double three times of upsampling on the input low-resolution left view and right view to obtain upsampling feature images L6 and R6, performing residual error operation on L6 and L5 in the step 8 in a matrix addition mode to reconstruct a high-resolution left view, and simultaneously performing residual error operation on R6 and R5 in the step 8 in a matrix addition mode to reconstruct a high-resolution right view;
the seventh part comprises two steps:
step 10, inputting the training set samples in the step 2 into the network from the step 3 to the step 9, and setting network super parameters: the learning rate is 2e-4, epochs is 60, batch size is 8, the optimizer is Adam, the loss function is MSE, and the training network is used for obtaining a final binocular image super-resolution pre-training model;
and 11, inputting the public test set and the actually existing low-resolution binocular image into the pre-training model obtained in the step 10, and simultaneously reconstructing the super-resolution binocular image by the network.
2. The binocular image super resolution method based on polar line windowed attention as set forth in claim 1, wherein EWA in step 5 (1) divides the input deep features L2.1 and R2.1, which have feature sizes of hxw x C, into 6 windows along W direction, respectively.
3. The method of claim 1, wherein in step 5 (2), all CAM-aligned left and right window features Y-L2 and Y-R2 are spliced in the W0 dimension to obtain aligned binocular features L3.1 and R3.1.
4. The binocular image super resolution method based on polar line windowed attention according to claim 1, wherein in step 6, the output of step 5 is used as the input of step 4, and step 4 and step 5 are sequentially iterated for 7 times.
5. The binocular image super resolution method of claim 1, wherein in step 7, the REWA sums the outputs of all EWAs using matrix addition to aggregate the aligned features of different depths.
CN202410029544.4A 2024-01-09 2024-01-09 Binocular image super-resolution method based on polar line windowing attention Active CN117710215B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410029544.4A CN117710215B (en) 2024-01-09 2024-01-09 Binocular image super-resolution method based on polar line windowing attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410029544.4A CN117710215B (en) 2024-01-09 2024-01-09 Binocular image super-resolution method based on polar line windowing attention

Publications (2)

Publication Number Publication Date
CN117710215A true CN117710215A (en) 2024-03-15
CN117710215B CN117710215B (en) 2024-06-04

Family

ID=90160876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410029544.4A Active CN117710215B (en) 2024-01-09 2024-01-09 Binocular image super-resolution method based on polar line windowing attention

Country Status (1)

Country Link
CN (1) CN117710215B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881858A (en) * 2022-05-17 2022-08-09 东南大学 Lightweight binocular image super-resolution method based on multi-attention machine system fusion
WO2022241995A1 (en) * 2021-05-18 2022-11-24 广东奥普特科技股份有限公司 Visual image enhancement generation method and system, device, and storage medium
CN116309072A (en) * 2023-03-29 2023-06-23 西南科技大学 Binocular image super-resolution method for feature channel separation and fusion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022241995A1 (en) * 2021-05-18 2022-11-24 广东奥普特科技股份有限公司 Visual image enhancement generation method and system, device, and storage medium
CN114881858A (en) * 2022-05-17 2022-08-09 东南大学 Lightweight binocular image super-resolution method based on multi-attention machine system fusion
CN116309072A (en) * 2023-03-29 2023-06-23 西南科技大学 Binocular image super-resolution method for feature channel separation and fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
雷鹏程;刘丛;唐坚刚;彭敦陆;: "分层特征融合注意力网络图像超分辨率重建", 中国图象图形学报, no. 09, 16 September 2020 (2020-09-16) *

Also Published As

Publication number Publication date
CN117710215B (en) 2024-06-04

Similar Documents

Publication Publication Date Title
Liu et al. Fcfr-net: Feature fusion based coarse-to-fine residual learning for depth completion
Fang et al. A hybrid network of cnn and transformer for lightweight image super-resolution
CN110570353A (en) Dense connection generation countermeasure network single image super-resolution reconstruction method
CN111242238B (en) RGB-D image saliency target acquisition method
CN107358576A (en) Depth map super resolution ratio reconstruction method based on convolutional neural networks
CN109785236B (en) Image super-resolution method based on super-pixel and convolutional neural network
CN112767253B (en) Multi-scale feature fusion binocular image super-resolution reconstruction method
CN111626927B (en) Binocular image super-resolution method, system and device adopting parallax constraint
Li et al. Dlgsanet: lightweight dynamic local and global self-attention networks for image super-resolution
CN112785502B (en) Light field image super-resolution method of hybrid camera based on texture migration
Zhou et al. Image super-resolution based on dense convolutional auto-encoder blocks
Zhang et al. Removing Foreground Occlusions in Light Field using Micro-lens Dynamic Filter.
CN114926337A (en) Single image super-resolution reconstruction method and system based on CNN and Transformer hybrid network
Shi et al. IDPT: Interconnected Dual Pyramid Transformer for Face Super-Resolution.
CN111080533B (en) Digital zooming method based on self-supervision residual sensing network
Zuo et al. Gradient-guided single image super-resolution based on joint trilateral feature filtering
CN112598604A (en) Blind face restoration method and system
CN117710215B (en) Binocular image super-resolution method based on polar line windowing attention
CN116309072A (en) Binocular image super-resolution method for feature channel separation and fusion
Han et al. Two-stage network for single image super-resolution
Liao et al. TransRef: Multi-scale reference embedding transformer for reference-guided image inpainting
CN116485654A (en) Lightweight single-image super-resolution reconstruction method combining convolutional neural network and transducer
CN116152060A (en) Double-feature fusion guided depth image super-resolution reconstruction method
CN116703719A (en) Face super-resolution reconstruction device and method based on face 3D priori information
Wu et al. Infrared and visible light dual-camera super-resolution imaging with texture transfer network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant