WO2023206343A1 - 一种基于图像预训练策略的图像超分辨率方法 - Google Patents

一种基于图像预训练策略的图像超分辨率方法 Download PDF

Info

Publication number
WO2023206343A1
WO2023206343A1 PCT/CN2022/090211 CN2022090211W WO2023206343A1 WO 2023206343 A1 WO2023206343 A1 WO 2023206343A1 CN 2022090211 W CN2022090211 W CN 2022090211W WO 2023206343 A1 WO2023206343 A1 WO 2023206343A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
image
resolution
attention
feature extraction
Prior art date
Application number
PCT/CN2022/090211
Other languages
English (en)
French (fr)
Inventor
陈翔宇
王鑫涛
周建涛
董超
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2022/090211 priority Critical patent/WO2023206343A1/zh
Publication of WO2023206343A1 publication Critical patent/WO2023206343A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting

Definitions

  • the present invention relates to the field of image processing technology, and more specifically, to an image super-resolution method based on an image pre-training strategy.
  • Image Super-Resolution refers to the recovery of a high-resolution image from a low-resolution image or image sequence. It has important application value in image processing, computational photography, high-definition film and television, security monitoring and other fields. Image super-resolution can be achieved using models such as convolutional neural network (CNN) or Transformer. Take Transformer as an example. It is a network structure based on the self-attention mechanism. Due to its powerful representation ability, it has shown great potential in the field of computer vision. In addition to many high-level tasks such as image classification and detection, In addition to the successful applications of segmentation, many methods based on the Transformer structure have recently been born in the field of underlying image processing.
  • CNN convolutional neural network
  • Transformer Take Transformer as an example. It is a network structure based on the self-attention mechanism. Due to its powerful representation ability, it has shown great potential in the field of computer vision. In addition to many high-level tasks such as image classification and detection, In addition to the successful applications of segmentation, many methods based on the Transformer structure have recently been
  • the existing image super-resolution method based on the Transformer structure (ie SwinIR) still has shortcomings. This is because existing methods can only utilize information from a limited input range, and are even inferior to some traditional CNN-based methods, such as RCAN. On the other hand, since existing methods use window-based self-attention, the obtained intermediate features often have obvious blocking artifacts, which is not conducive to the network learning more accurate features. Good recovery effect. In addition, Transformer-based structures usually require large-scale data for training, and existing commonly used data sets are no longer sufficient to exploit the potential of such structures.
  • the current image super-resolution method based on the Transformer structure has a very limited utilization range of input information, and obvious block effects can be observed in the middle features of the network, which means that this type of method can achieve the best in image super-resolution. Performance is still limited, and the effects have a lot of room for improvement.
  • the purpose of the present invention is to overcome the above-mentioned shortcomings of the prior art and provide an image super-resolution method based on an image pre-training strategy.
  • the method includes the following steps:
  • the image super-resolution model sequentially includes a shallow feature extraction module, a deep feature extraction module and a high-resolution reconstruction module, where the shallow layer
  • the feature extraction module extracts shallow features from the input target image.
  • the deep feature extraction module uses the channel attention mechanism and self-attention mechanism to extract multi-level intermediate features.
  • the high-resolution reconstruction module uses the output features from the deep feature extraction module and The output features from the shallow feature extraction module passed through the residual connection are used as input to obtain a high-resolution image relative to the input target image.
  • the advantage of the present invention is that it can greatly increase the range of input information utilized by the network in relation to the existing network structure, and at the same time greatly improves the block effect phenomenon existing in the intermediate features of existing methods.
  • the present invention further develops the potential of the network by utilizing a large-scale data pre-training strategy for the same task, and is thus able to better realize the image super-resolution function, significantly surpassing the current state-of-the-art methods in performance.
  • Figure 1 is a flow chart of an image super-resolution method based on an image pre-training strategy according to an embodiment of the present invention
  • Figure 2 is a schematic process diagram of an image super-resolution method based on an image pre-training strategy according to an embodiment of the present invention
  • Figure 3 is a schematic diagram of an overlapping cross-attention module according to an embodiment of the present invention.
  • Figure 4 is a comparison diagram of image effects according to an embodiment of the present invention.
  • any specific values are to be construed as illustrative only and not as limiting. Accordingly, other examples of the exemplary embodiments may have different values.
  • the present invention is described below based on the Transformer network structure as an example.
  • the provided image super-resolution method based on the image pre-training strategy includes the following steps.
  • Step S110 Construct a large-scale training data set using a public data set, in which each sample reflects the correspondence between a low-resolution image and a high-resolution image.
  • large-scale data are collected to construct low-resolution and high-resolution image data pairs.
  • Step S120 Construct an image super-resolution model.
  • an image super-resolution model is constructed based on Transformer, which takes a low-resolution image as input and a high-resolution image (ie, super-resolution image) as output.
  • the architecture of this model is one of the core improvements of the present invention and will be introduced in detail below.
  • Step S130 Pre-train the image super-resolution model using a large-scale training data set.
  • the pre-training process takes the minimization of the set loss function as the optimization goal.
  • the L1 loss function can be used and optimized through the gradient descent method until convergence.
  • the optimization parameters of the model such as weights, biases, etc., can be obtained through the pre-training process.
  • Pre-training refers to using additional data to train part or all of the network before training the target network.
  • Step S140 Construct a small sample training data set based on the target data source.
  • step S110 a certain number of images are collected based on the target data source (that is, for a specific usage scenario), and low-resolution and high-resolution data pairs are obtained through downsampling (or direct collection). In this way, a small sample data set adapted to the target scenario can be obtained, and its data size is much smaller than the training data set constructed in step S110.
  • Step S150 Use the small sample data set to fine-tune the pre-trained image super-resolution model.
  • a small sample data set is used to fine-tune the pre-trained image super-resolution model.
  • a smaller learning rate can be used (for example, set to initial learning (1/20 of the rate) and perform a small amount of iterative optimization to obtain the final model.
  • Step S160 Input the low-resolution image to be reconstructed into the trained image super-resolution model to obtain a high-resolution result.
  • the trained model can be used to obtain the super-resolution image reconstruction result corresponding to the input image.
  • pre-training can stimulate the model's greater representation potential, and then use small sample data for fine-tuning, which can further enhance the adaptability to specific scenarios.
  • there is no such pre-training strategy In the existing field of image super-resolution or other low-level image processing fields, there is no such pre-training strategy. This is because the existing model has limited representation ability, and even if the pre-training strategy is used, it cannot improve the representation accuracy of the model. sex.
  • the main structure of the image super-resolution model is to input the low-resolution picture (image) to be reconstructed into the network, and then pass through the shallow feature extraction module, the deep feature extraction module and the high-resolution reconstruction module in sequence, and finally output High resolution pictures.
  • the shallow feature extraction module for example, it can be directly implemented using a layer of 3 ⁇ 3 convolutional layer.
  • the obtained shallow features are input to the deep feature extraction module for complex mapping, and are passed to the deep feature extraction module through a residual connection.
  • the shallow features and deep features are then added and reconstructed through the high-resolution reconstruction module.
  • the high-resolution reconstruction module can be composed of two to three convolutional layers plus one pixel shuffle layer (Pixel Shuffle), and finally obtain a high-resolution image result.
  • the deep feature extraction module includes N hybrid attention module groups (Hybrid Attention Group, HAG).
  • Each hybrid attention module group includes M hybrid attention sub-modules (Hybrid Attention Block, HAB) and an overlapping cross-scale attention module (Overlapping Cross-scale Attention Block, OCAB) and a residual connected from input to output. connect.
  • the number N of hybrid attention module groups can be set to one or more. It has been verified that when the value of N is 6, the efficiency is optimal, and when the value of N is 12, the performance reaches saturation.
  • the number M of hybrid attention sub-modules can be set to one or more, for example, the value of M is 6.
  • the features For each hybrid attention sub-module HAB, the features first undergo a layer normalization (Layer Normalization) operation, and then pass through the channel attention module (Channel Attention Block, CAB) and self-attention (Window Self-attention, WSA) respectively. module.
  • the two attention modules reorganize and map the input features from the spatial dimension and the channel dimension respectively, and then add the results together, including a residual connection, and input them together to the next part (i.e., the next hybrid attention sub-module or Overlapping cross-attention module), including a layer normalization layer and a multi-layer perceptron layer.
  • the features first pass through the layer normalization layer and then pass through the overlapping cross-attention module, combined with the residual connection and input to the latter part.
  • the overlapping cross-attention module is shown in Figure 3.
  • Common self-attention mechanisms use standard window division methods when calculating based on windows, that is, dividing X Q , X K , X V in a non-overlapping manner to obtain the query for calculating the input feature X (Q, query), key(K, key) and value(V, value).
  • the proposed overlapping cross-attention module uses a standard window division method to divide X Q to obtain Q, and then uses an overlapping method to divide X K and X V to obtain K and V. K and V obtained in this way have a larger information range than Q, allowing better integration of cross-window information.
  • the self-attention mechanism is used to calculate correlations for different parts of the input image (or different area blocks).
  • the corresponding Q, K, and V matrices are calculated based on the internal characteristics of the image block. Therefore, when calculating self-attention based on the window method, there is no window overlap.
  • the present invention proposes for the first time that in the process of calculating matrices K and V, there is overlap between adjacent sliding windows, so that information from different windows can be integrated and the correlation between each image block can be calculated more accurately.
  • the window size is usually set to a smaller size, for example, to 8 or less.
  • the window size is set to a larger size, for example, to 16. It has been verified that this setting significantly improves the representation ability of the model.
  • the present invention can be used for other image restoration or enhancement tasks, such as image deblurring, denoising, rain removal, haze removal, snow removal and dereflection, etc. .
  • the present invention can also be used as a basic structure for multi-frame tasks such as binocular super-resolution (Stereo SR), video super-resolution and other multi-frame tasks.
  • the method using a larger window size of 16 has higher performance on five evaluation data sets (i.e. Set5, Set14, BSD100, Urban100 and manga109) than the existing technology that uses 8 as the window size.
  • PSNR and SSIM and the improvement is significant.
  • the present invention integrates the channel attention mechanism into the Transformer module and proposes a hybrid attention module.
  • the introduction of the channel attention mechanism can dynamically utilize the global features of the intermediate features of the network.
  • the information dynamically calculates the weights between different channels, thereby significantly improving the network's ability to utilize global information, see Table 2 below.
  • the existing technology can only focus on the information within the current window when calculating self-attention, which is also the root cause of the block effect.
  • the present invention specifically proposes overlapping cross-attention modules, which can better integrate cross-window information and significantly improve network performance. See Table 2 below.
  • the present invention proposes a large-scale data pre-training strategy based on the same task.
  • This pre-training strategy greatly improves the performance of the network.
  • Table 3 below, where EDT strategy is the prior art, and Our strategy refers to the embodiment of the present invention.
  • the pre-training strategy of the present invention significantly improves the representation ability of the model, both for the model results of pre-training (pre-training) and the model results of fine-tuning (fine-tuning training) after pre-training. , achieving very significant performance gains.
  • the present invention provides targeted technical solutions, including: using a large-window self-attention mechanism to improve the range of calculating the attention matrix and expanding the utilization range of input information;
  • the module is integrated into the network architecture to realize the network's ability to utilize global information;
  • overlapping cross-attention is proposed to further enhance the network's ability to utilize a wider range of information and better utilize cross-window information to eliminate block effects as much as possible.
  • the present invention far exceeds the current most advanced image super-resolution method in terms of numerical indicators and visual effect comparison. And it is highly versatile. It can not only be used for image super-resolution, but the designed modules OCAB and HAB can be easily adapted to the model structure of other image restoration and enhancement tasks.
  • the pre-training strategy can also be used in other tasks. In addition, by simply modifying the input and output sizes and training data of the model of the present invention, it can be quickly used for other underlying image processing tasks.
  • the present invention can be used for camera imaging processing and image post-processing. It has room for use in mobile phones, cameras, security monitoring and other fields.
  • the invention may be a system, method and/or computer program product.
  • a computer program product may include a computer-readable storage medium having computer-readable program instructions thereon for causing a processor to implement various aspects of the invention.
  • Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) or Flash memory), Static Random Access Memory (SRAM), Compact Disk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Floppy Disk, Mechanical Coding Device, such as a printer with instructions stored on it.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • Flash memory Static Random Access Memory
  • CD-ROM Compact Disk Read Only Memory
  • DVD Digital Versatile Disk
  • Memory Stick
  • Computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .
  • Computer program instructions for performing operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages.
  • the computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect).
  • LAN local area network
  • WAN wide area network
  • an external computer such as an Internet service provider through the Internet. connect
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA)
  • the electronic circuit can Computer readable program instructions are executed to implement various aspects of the invention.
  • These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that embody one or more elements for implementing the specified logical function(s).
  • Executable instructions may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions. It is well known to those skilled in the art that implementation through hardware, implementation through software, and implementation through a combination of software and hardware are all equivalent.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于图像预训练策略的图像超分辨率方法。该方法包括:采集目标图像;将目标图像输入到经训练的图像超分辨率模型,输出超分辨率图像,所述图像超分辨率模型依次包括浅层特征提取模块、深层特征提取模块和高分辨率重建模块,其中,浅层特征提取模块针对输入的目标图像提取浅层特征,深层特征提取模块采用通道注意力机制和自注意力机制提取多层次的中间特征,高分辨率重建模块以来自于深层特征提取模块的输出特征以及经由残差连接传递的来自于浅层特征提取模块的输出特征作为输入,获得相对于输入目标图像的高分辨率图像。本发明无论是在数值指标还是视觉效果方面,都提升了图像超分辨率的性能,并且通用性强。

Description

一种基于图像预训练策略的图像超分辨率方法 技术领域
本发明涉及图像处理技术领域,更具体地,涉及一种基于图像预训练策略的图像超分辨率方法。
背景技术
图像超分辨率(Image Super-Resolution)是指由一幅低分辨率图像或图像序列恢复出高分辨率图像,在图像处理、计算摄影、高清影视、安防监控等领域都有重要的应用价值。图像超分辨率可采用卷积神经网络(CNN)或Transformer(转换器)等模型实现。以Transformer为例,其是一种基于自注意力(Self-attention)机制的网络结构,由于强大的表征能力在计算机视觉领域展示出了巨大的潜力,除了在许多高层任务如图像分类,检测,分割等取得了成功应用外,近期在底层图像处理领域也诞生了许多基于Transformer结构的方法。
尽管Transformer结构相比卷积神经网络似乎拥有更强的表征能力,但现有的基于Transformer结构的图像超分辨率方法(即SwinIR)依然存在缺陷。这是因为,现有方法只能利用有限输入范围的信息,甚至于不如一些传统基于CNN的方法,如RCAN。另一方面,由于现有方法使用的是基于窗口的自注意力机制(window-based self-attention),因此得到的中间特征往往存在明显的块效应(blocking artifacts),这不利于网络学习到更好的复原效果。此外,基于Transformer的结构通常需要大规模的数据进行训练,现有的常用数据集已经不足以开发出这类结构的潜力。
综上,目前基于Transformer结构的图像超分辨率方法对于输入信息的利用范围非常有限,并且能够观察到网络中间特征存在明显的块效应,这意味着这类方法在图像超分辨率上能够达到的性能依然有限,效果有很大的进步空间。
发明内容
本发明的目的是克服上述现有技术的缺陷,提供一种基于图像预训练策略的图像超分辨率方法,该方法包括以下步骤:
采集目标图像;
将目标图像输入到经训练的图像超分辨率模型,输出超分辨率图像,所述图像超分辨率模型依次包括浅层特征提取模块、深层特征提取模块和高分辨率重建模块,其中,浅层特征提取模块针对输入的目标图像提取浅层特征,深层特征提取模块采用通道注意力机制和自注意力机制提取多层次的中间特征,高分辨率重建模块以来自于深层特征提取模块的输出特征以及经由残差连接传递的来自于浅层特征提取模块的输出特征作为输入,获得相对于输入目标图像的高分辨率图像。
与现有技术相比,本发明的优点在于,能够针对现有网络结构大幅增加网络利用输入信息的范围,同时很大程度上改善了已有方法在中间特征中存在的块效应现象。此外,本发明通过利用相同任务的大规模数据预训练策略,进一步开发了网络的潜力,因而能够更好地实现图像超分辨率功能,在性能上显著超越了当前最先进的方法。
通过以下参照附图对本发明的示例性实施例的详细描述,本发明的其它特征及其优点将会变得清楚。
附图说明
被结合在说明书中并构成说明书的一部分的附图示出了本发明的实施例,并且连同其说明一起用于解释本发明的原理。
图1是根据本发明一个实施例的基于图像预训练策略的图像超分辨率方法的流程图;
图2是根据本发明一个实施例的基于图像预训练策略的图像超分辨率方法的过程示意图;
图3是根据本发明一个实施例的重叠的交叉注意力模块的示意图;
图4是根据本发明一个实施例的图像效果对比图。
具体实施方式
现在将参照附图来详细描述本发明的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
在这里示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
为清楚起见,下文基于Transformer网络结构为例描述本发明。参见图1所示,所提供的基于图像预训练策略的图像超分辨率方法包括以下步骤。
步骤S110,利用已公开数据集构建大规模训练数据集,其中每条样本反映低分辨率图像与高分辨率图像之间的对应关系。
首先,收集大规模数据构建低分辨率与高分辨率图像数据对。例如,为了节省收集样本数据的成本,可使用已公开的大规模数据集或者其他途径下载已有数据集,通过下采样方法(如bicubic下采样)获得低分辨率图像与高分辨率图像数据对,以构建出大规模训练数据集。
步骤S120,构造图像超分辨率模型。
在一个实施例中,基于Transformer构造图像超分辨率模型,该模型以低分辨率图像为输入,以高分辨率图像(即超分辨率图像)为输出。该模型的架构是本发明的核心改进之一,将在下文具体介绍。
步骤S130,利用大规模训练数据集对图像超分辨率模型进行预训练。
预训练过程以设定的损失函数最小化为优化目标,例如,可采用L1损失函数,通过梯度下降法优化,直至收敛。通过预训练过程可获得模型的优化参数,如权重、偏置等。预训练是指在训练目标网络之前先使用额外数据对网络的一部分或者全部进行训练。
步骤S140,基于目标数据源构建小样本训练数据集。
在此步骤中,基于目标数据源(即针对特定的使用场景)采集一定数量的图像,通过下采样(或直接采集)获得低分辨率与高分辨率数据对。通过这种方式,可以获得适应于目标场景的小样本数据集,其数据量规模远小于步骤S110构建的训练数据集。
步骤S150,利用小样本数据集对预训练的图像超分辨率模型进行微调。
为了增强图像超分辨率模型针对目标场景的自适应性或鲁棒性,利用小样本数据集对预训练的图像超分辨率模型进行微调训练,可使用较小的学习率(例如设置为初始学习率的1/20)进行少量迭代优化,获得最终的模型。
步骤S160,将待重建的低分辨率图像输入到训练好的图像超分辨率模型,得到高分辨率结果。
经过上述预训练和微调训练过程之后,即可利用训练好的模型,获得对应于输入图像的超分辨率图像重建结果。对于所设计的表征能力更强的图像超分辨率模型,通过预训练可以激发模型更大的表征潜力,进而再利用小样本数据进行微调,可以进一步增强针对特定场景的适应能力。在现有的图像超分辨率领域或其他的底层图像处理领域,还不存在这种预训练策略,这是因为现有模型由于表征能力受限,即使采用预训练策略也无法提升模型的表征准确性。
以下将重点说明图像超分辨率模型的结构和图像重建过程。
如图2所示,图像超分辨率模型的主结构是将待重建的低分辨率图片(图像)输入网络,依次经过浅层特征提取模块,深层特征提取模块以及高分辨率重建模块,最终输出高分辨率图片。对于浅层特征提取模块,例如,可直接使用一层3×3卷积层实现。得到的浅层特征输入到深层特征提 取模块进行复杂映射,同时经过一个残差连接传递到深层特征提取模块之后,进而将浅层特征与深层特征相加后经过高分辨率重建模块进行重建。在一个实施例中,高分辨率重建模块可以使用两至三层卷积层加一层像素混洗层(Pixel Shuffle)组成,最后得到高分辨率图片结果。
图像超分辨率模型的核心设计主要体现在深层特征提取模块。例如,深层特征提取模块包括N个混合注意力模块组(Hybrid Attention Group,HAG)。每个混合注意力模块组包括M个混合注意力子模块(Hybrid Attention Block,HAB)以及一个重叠的交叉注意力模块(Overlapping Cross-scale Attention Block,OCAB)和一个从输入连到输出的残差连接。混合注意力模块组的数量N可设置为一个或多个,经验证,当N取值为6时效率最优,N取值为12时性能达到饱和。混合注意力子模块的数量M可设置为一个或多个,例如M取值为6。
对于每个混合注意力子模块HAB,特征首先经过一个层归一化(Layer Normalization)操作,然后分别通过通道注意力模块(Channel Attention Block,CAB)以及自注意力(Window Self-attention,WSA)模块。两个注意力模块分别对输入特征从空间维度和通道维度进行信息重组和映射,然后将得到的结果加在一起包含一个残差连接一起输入到下一个部分(即下一个混合注意力子模块或重叠的交叉注意力模块),包括一个层归一化层和一个多层感知机层。
对于重叠的交叉注意力模块OCAB,与混合注意力子模块HAB类似,特征先经过层归一化层后通过重叠的交叉注意力模块,结合残差连接输入到后一个部分。
重叠的交叉注意力模块如图3所示,常见的自注意力机制基于窗口计算时使用标准窗口划分方式,即以不重叠的方式划分X Q,X K,X V得到计算输入特征X的query(Q,询问),key(K,键)和value(V,值)。在图3的实施例中,所提出的重叠的交叉注意力模块使用标准的窗口划分方式来划分X Q以得到Q,然后使用有重叠的方式划分X K和X V,得到K和V。通过这种方式得到的K和V具有比Q更大的信息范围,从而能够更好地整合跨窗口的信息。
需要说明的是,自注意力机制用于针对输入图像的不同部分(或称不同区域块)计算相关性,在现有的自注意力计算中,对于某一图像块计算自注意力时,都是基于该图像块的内部特征计算对应的Q、K、V三个矩阵,因此,采用基于窗口方式计算自注意力时,不存在窗口重叠情况。而本发明第一次提出了计算矩阵K和V的过程中,相邻的滑动窗口之间存在重叠,从而可以整合不同窗口的信息,更准确地计算各图像块之间的相关性。此外,现有技术中通常设置较小的窗口尺寸,例如设置为8或以下,本发明在重叠窗口的设计下,将窗口尺寸设置为更大尺寸,例如16。经验证,这种设置显著提升了模型的表征能力。
应理解的是,在不违背本发明精神和范围的前提下,本领域技术人员可对上述实施例进行适当的改变或变型。例如,通过改变训练数据集和简单调整网络输入输出尺寸的方式,可以将本发明用于其他图像复原或增强任务,如图像去模糊、去噪、去雨、去雾、去雪和去反射等。此外,本发明也可以作为基本结构用于如双目超分辨率(Stereo SR),视频超分辨率等多帧任务中。
为了进一步验证本发明的效果,进行了仿真验证。首先,在验证中,使用了大窗口尺寸计算自注意力,现有的技术往往使用8,最多不超过12的窗口尺寸(Window size)计算自注意力机制。本发明通过使用更大的窗口尺寸16,在增加了有限的计算量的前提下,大幅度提升了网络模型的表征能力,取得了更高的性能。参见下表1。
表1:不同窗口尺寸对于网络性能的影响
Figure PCTCN2022090211-appb-000001
由表1可以看出,使用了更大窗口尺寸16的方法相比现有技术使用8作为窗口尺寸,在5个评价数据集上(即Set5、Set14、BSD100、Urban100和manga109)具有更高的PSNR和SSIM,且提升显著。
针对现有技术无法根据特征的全局信息动态计算通道间权重的问题,本发明将通道注意力机制融入Transformer模块中,提出了混合注意力模块,引入通道注意力机制能够动态利用网络中间特征的全局信息动态计算不同 通道间的权重,从而显著提高了网络利用全局信息的能力,参见下表2。
现有技术在计算自注意力时只能关注当前窗口内的信息,这也是块效应产生的根本原因。本发明针对性地提出了重叠的交叉注意力模块,可以更好地整合跨窗口的信息,对于网络性能的提升有明显的增益,参见下表2。
表2:OCAB和HAB对于性能的影响
  Baseline w/OCAB w/CAB Ours
PSNR 27.81 27.91 27.91 27.97
SSIM 0.8336 0.8352 0.8355 0.8366
由表2可以看出,OCAB和HAB(即w/CAB的模块)对性能均有明显提升(达到0.1dB),并且同时添加这两种模块有更为显著的性能提升。
针对现有技术使用多任务或多退化水平进行预训练,导致网络性能的提升有限的问题,本发明提出基于相同任务的大规模数据预训练策略,这种预训练策略大幅度提升了网络的性能,参见下表3,其中EDT strategy是现有技术,Our strategy是指本发明实施例。
表3:不同预训练策略对于性能的影响
Figure PCTCN2022090211-appb-000002
由表3可以看出,本发明的预训练策略无论是对于pre-training(预训练)的模型结果还是对于pre-training后fine-tuning(微调训练)的模型结果,明显提升了模型的表征能力,取得了十分显著的性能增益。
实验中观察到,本发明相比其他现有方法在数值指标上有显著的性能增益,在标准实验数据集尤其是Urban100和Manga109上,这种差距更为明显。此外,还对比了视觉效果对比,如图4所示,其中,图4(a)是低分辨率图像,图4(b)是超分辨率结果,图4(c)是高分辨率原图,可以看出,利用本发明获得的超分辨率图像能够恢复出更多更清晰的纹理细节。
综上所述,通过大量观察现有方法的结果图并通过理论和实验分析发现,现有方法对于输入信息的利用能力十分有限,对于跨窗口计算自注意力的能力不足,并且现有的Transformer结构对于数据量的需求十分巨大, 本发明提供了针对性的技术方案,包括:使用大窗口的自注意力机制以提升计算注意力矩阵的范围,扩大对于输入信息的利用范围;将通道注意力模块融入网络架构中以实现网络对于全局信息的利用能力;提出了重叠的交叉注意力来进一步增强网络利用更大范围信息的能力并且更好地利用跨窗口的信息,以尽可能消除块效应的影响;此外,提出使用相同任务大规模数据预训练的方法,以尽可能开发Transformer结构的潜力,获得更优异的性能。经验证,本发明无论是在数值指标还是视觉效果对比上,都远超当前最先进的图像超分辨率方法。并且通用性强,不仅可以用于图像超分辨率,而且所设计的模块OCAB和HAB可以很容易适配到其他图像复原和增强任务的模型结构中,预训练策略也可以在其他任务中使用。此外,简单修改本发明模型的输入输出尺寸以及训练数据,即可快速用于其他底层图像处理任务上。在应用层面,本发明可以用于相机成像处理,也可以用于图像后期处理,无论是对于手机,相机,安防监控等领域都有发挥的空间。
本发明可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本发明操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++、Python等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本发明的各个方面。
这里参照根据本发明实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计 算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本发明的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是,通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本发明的范围由所附权利要求来限定。

Claims (10)

  1. 一种基于图像预训练策略的图像超分辨率方法,包括以下步骤:
    采集目标图像;
    将目标图像输入到经训练的图像超分辨率模型,输出超分辨率图像,所述图像超分辨率模型以输入为参考依次包括浅层特征提取模块、深层特征提取模块和高分辨率重建模块,其中,浅层特征提取模块针对输入的目标图像提取浅层特征,深层特征提取模块采用通道注意力机制和自注意力机制提取多层级的中间特征,高分辨率重建模块以来自于深层特征提取模块的输出特征以及经由残差连接传递的来自于浅层特征提取模块的输出特征作为输入,获得相对于输入目标图像的高分辨率图像。
  2. 根据权利要求1所述的方法,其特征在于,所述深层特征提取模块包括多个混合注意力模块组,每个混合注意力模块组包括依次连接的多个混合注意力子模块和一个重叠的交叉注意力模块,该重叠的交叉注意力模块的输出经由残差连接融合第一混合注意力子模块的输入,并且该重叠的交叉注意力模块在自注意力计算过程,基于重叠的窗口获得不同图像块之间的相关性。
  3. 根据权利要求2所述的方法,其特征在于,每个混合注意力子模块包括第一层归一化模块、通道注意力模块、自注意力模块、第二层归一化模块和多层感知机,其中,通道注意力模块和自注意力模块用于针对第一层归一化模块的输出从空间维度和通道维度进行重组和映射,获得第一结果,并且该第一结果融合了经由残差连接传递的第一层归一化模块的输入,该第一结果进而依次经由第二层归一化模块和多层感知机,获得第二结果,且该第二结果融合了经由残差连接传递的第一结果。
  4. 根据权利要求2所述的方法,其特征在于,所述重叠的交叉注意力模块包括第三层归一化模块、重叠的交叉注意力模块、第四层归一化模块和多层感知机,其中所述重叠的交叉注意力模块的输出与第三层归一化模块的输入具有残差连接,第四层归一化模块的输入与多层感知机的输出之间具有残差连接。
  5. 根据权利要求1所述的方法,其特征在于,根据以下步骤训练所述 图像超分辨率模型:
    利用已公开的数据集构建第一训练数据集,该第一训练数据集的每条样本反映低分辨率图像与高分辨率图像之间的对应关系;
    基于该第一训练数据集预训练图像超分辨率模型,直到满足设定的损失函数标准;
    利用第二训练数据集对预训练的图像超分辨率模型进行微调,其中,第二训练数据集针对目标数据源采集,并且第二训练数据集中的样本数目小于第一训练数据集。
  6. 根据权利要求2所述的方法,其特征在于,对于所述重叠的交叉注意力模块,其计算注意力的方式是,使用标准的窗口划分方式获得询问Q,并且使用有重叠的窗口划分方式得到键K和值V。
  7. 根据权利要求6所述的方法,其特征在于,对于所述重叠的交叉注意力模块,将窗口尺寸设置为16。
  8. 根据权利要求1所述的方法,其特征在于,所述图像超分辨率模型基于transformer构建,所述浅层特征提取模块包含一层卷积层、所述高分辨率重建模块包含多层卷积层和一层像素混洗层,所述深层特征提取模块提取的中间特征数目设置为六层级。
  9. 一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现根据权利要求1至8中任一项所述方法的步骤。
  10. 一种计算机设备,包括存储器和处理器,在所述存储器上存储有能够在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至8中任一项所述的方法的步骤。
PCT/CN2022/090211 2022-04-29 2022-04-29 一种基于图像预训练策略的图像超分辨率方法 WO2023206343A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/090211 WO2023206343A1 (zh) 2022-04-29 2022-04-29 一种基于图像预训练策略的图像超分辨率方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/090211 WO2023206343A1 (zh) 2022-04-29 2022-04-29 一种基于图像预训练策略的图像超分辨率方法

Publications (1)

Publication Number Publication Date
WO2023206343A1 true WO2023206343A1 (zh) 2023-11-02

Family

ID=88517004

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090211 WO2023206343A1 (zh) 2022-04-29 2022-04-29 一种基于图像预训练策略的图像超分辨率方法

Country Status (1)

Country Link
WO (1) WO2023206343A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
CN112419153A (zh) * 2020-11-23 2021-02-26 深圳供电局有限公司 图像超分辨率重建方法、装置、计算机设备和存储介质
CN112862689A (zh) * 2021-03-09 2021-05-28 南京邮电大学 一种图像超分辨率重建方法及系统
CN113409191A (zh) * 2021-06-02 2021-09-17 广东工业大学 一种基于注意力反馈机制的轻量级图像超分方法及系统
CN113706388A (zh) * 2021-09-24 2021-11-26 上海壁仞智能科技有限公司 图像超分辨率重建方法及装置
CN114140353A (zh) * 2021-11-25 2022-03-04 苏州大学 一种基于通道注意力的Swin-Transformer图像去噪方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
CN112419153A (zh) * 2020-11-23 2021-02-26 深圳供电局有限公司 图像超分辨率重建方法、装置、计算机设备和存储介质
CN112862689A (zh) * 2021-03-09 2021-05-28 南京邮电大学 一种图像超分辨率重建方法及系统
CN113409191A (zh) * 2021-06-02 2021-09-17 广东工业大学 一种基于注意力反馈机制的轻量级图像超分方法及系统
CN113706388A (zh) * 2021-09-24 2021-11-26 上海壁仞智能科技有限公司 图像超分辨率重建方法及装置
CN114140353A (zh) * 2021-11-25 2022-03-04 苏州大学 一种基于通道注意力的Swin-Transformer图像去噪方法及系统

Similar Documents

Publication Publication Date Title
Tian et al. Deep learning on image denoising: An overview
Sun et al. Deep pixel‐to‐pixel network for underwater image enhancement and restoration
CN109101975B (zh) 基于全卷积神经网络的图像语义分割方法
WO2021018163A1 (zh) 神经网络的搜索方法及装置
CN109523470B (zh) 一种深度图像超分辨率重建方法及系统
EP4109392A1 (en) Image processing method and image processing device
Liu et al. Cross-SRN: Structure-preserving super-resolution network with cross convolution
CN112233038A (zh) 基于多尺度融合及边缘增强的真实图像去噪方法
EP4163832A1 (en) Neural network training method and apparatus, and image processing method and apparatus
CN111754446A (zh) 一种基于生成对抗网络的图像融合方法、系统及存储介质
CN113066017B (zh) 一种图像增强方法、模型训练方法及设备
CN110349087B (zh) 基于适应性卷积的rgb-d图像高质量网格生成方法
CN114998099A (zh) 一种基于图像预训练策略的图像超分辨率方法
CN110189260B (zh) 一种基于多尺度并行门控神经网络的图像降噪方法
CN112465727A (zh) 基于HSV色彩空间和Retinex理论的无正常光照参考的低照度图像增强方法
CN112241939B (zh) 一种基于多尺度和非局部的轻量去雨方法
CN111079764A (zh) 一种基于深度学习的低照度车牌图像识别方法及装置
CN112102182A (zh) 一种基于深度学习的单图像去反射方法
Cao et al. New architecture of deep recursive convolution networks for super-resolution
CN111626927A (zh) 采用视差约束的双目图像超分辨率方法、系统及装置
CN114627034A (zh) 一种图像增强方法、图像增强模型的训练方法及相关设备
Zheng et al. T-net: Deep stacked scale-iteration network for image dehazing
CN115713462A (zh) 超分辨模型训练方法、图像识别方法、装置及设备
Zhang et al. Single image dehazing based on bright channel prior model and saliency analysis strategy
Zhou et al. MSAR‐DefogNet: Lightweight cloud removal network for high resolution remote sensing images based on multi scale convolution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22939192

Country of ref document: EP

Kind code of ref document: A1