WO2023109709A1 - 一种基于注意力机制的图像拼接定位检测方法 - Google Patents

一种基于注意力机制的图像拼接定位检测方法 Download PDF

Info

Publication number
WO2023109709A1
WO2023109709A1 PCT/CN2022/138200 CN2022138200W WO2023109709A1 WO 2023109709 A1 WO2023109709 A1 WO 2023109709A1 CN 2022138200 W CN2022138200 W CN 2022138200W WO 2023109709 A1 WO2023109709 A1 WO 2023109709A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
edge
stitching
attention mechanism
path
Prior art date
Application number
PCT/CN2022/138200
Other languages
English (en)
French (fr)
Inventor
张玉兰
朱国普
杨建权
刘祖权
Original Assignee
深圳先进技术研究院
中国科学院深圳理工大学(筹)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院, 中国科学院深圳理工大学(筹) filed Critical 深圳先进技术研究院
Publication of WO2023109709A1 publication Critical patent/WO2023109709A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/41Analysis of texture based on statistical description of texture
    • G06T7/44Analysis of texture based on statistical description of texture using image operators, e.g. filters, edge density metrics or local histograms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present invention relates to the technical field of image splicing and positioning detection methods, in particular to an attention mechanism-based image splicing and positioning detection method.
  • Image stitching is a commonly used method of image tampering. Usually, a certain area in an image (called donated image) is copied, and then pasted into another image (called donated image) after geometric operations such as scaling and rotation. In a certain area of the receptor image), the composite image is finally subjected to post-processing operations such as Gaussian filtering and image enhancement, so that the stitching area is consistent with the receptor image. Post-processing of the edges of stitched regions makes stitch localization more challenging.
  • Existing image mosaic localization techniques mainly include methods based on traditional features and methods based on deep learning.
  • the traditional image mosaic localization method mainly locates the mosaic area according to the characteristics of the sensor pattern noise, the interpolation mode of the color filter array, or the JPEG compression trace of the mosaic area.
  • these traditional methods are aimed at certain image properties, and are not applicable to all stitching types.
  • the method based on deep learning mainly uses the data-driven function of big data to learn the characteristics of the stitching area, and then locate the stitching area.
  • most of the existing deep learning methods only use stitched images and the corresponding ground-truth mask for learning, ignoring the effect of the edge of the image on the edge of the stitched area, and the edge of the positioning area is not very ideal.
  • existing deep learning-based methods only focus on high-level features of deeper networks in convolutional networks, while ignoring low-level features of shallow networks, resulting in inaccurate splicing localization.
  • Disadvantage 1 of the existing technology the existing technology only uses the deep features of the convolutional network, but does not use the low-level features output by the shallow network, so that the splicing and positioning results need to be further improved.
  • the low-level features output by the shallow network include the local features of the image and some detailed information of the image, which can improve the feature expression ability of the network and further improve the positioning effect.
  • Disadvantage 2 of the prior art the prior art only utilizes the spliced image, but does not utilize the edge information of the image and the edge information of the spliced area.
  • the edge information of the image and the edge information of the stitching area can guide the edge of the stitching area and improve the accuracy of stitching edge positioning.
  • Disadvantage 3 of the existing technology The existing technology simply fuses the features without recalibrating the attention mechanism, which makes the output features less discriminative, and the positioning results need to be further improved.
  • the technical problem to be solved in the present invention is to overcome the above technical defects, provide a method for image splicing and positioning detection based on attention mechanism, design a multi-task loss function, and simultaneously learn the edge information of the image, the edge information of the splicing area and the splicing area, Improve the positioning results of stitching edges; use the shallow network to extract low-level texture features, and enhance the feature expression ability of the proposed network; finally, use the squeeze-excitation attention mechanism to recalibrate the fusion features, so that the model pays more attention to the Locate useful features in stitched regions and assign greater weight to them.
  • an attention mechanism-based image splicing positioning detection method comprising the following steps:
  • Step 1 Prepare the image stitching data set and divide it into three parts: training set, verification set and test set;
  • Step 2 Design a dual-stream multi-task learning neural network structure
  • Step 3 Design a multi-task loss function
  • Step 4 Optimizing the training to obtain the splicing area positioning model
  • Step 5 Input the image to be detected into the model trained in step 4 to obtain the splicing and positioning results.
  • step 1 use 4 benchmark image stitching datasets CASIA1.0, 461 images, CASIA2.0, 5123 images, 100 images of Carvalho dataset, 180 images of Columbia dataset, and two synthetic stitching datasets spliced_NIST13575 images And spliced_Dresden 35712, each data set is allocated the number of training set, verification set and test set according to the ratio of 7:2:1.
  • step 2 includes an edge-guided path and a label mask path, wherein the edge-guidance path is an encoding and decoding path composed of U-Net, which is supervised by the edge of the image, and the label mask path is formed by a U-Net An encoding-decoding path, the groundtruth mask of the stitched region and the edges of the stitched region are used to supervise the label mask path.
  • the edge-guidance path is an encoding and decoding path composed of U-Net, which is supervised by the edge of the image
  • the label mask path is formed by a U-Net
  • An encoding-decoding path, the groundtruth mask of the stitched region and the edges of the stitched region are used to supervise the label mask path.
  • the multi-task loss function in step 3 includes three aspects, the first is label mask loss, the second is mask edge loss, and the third is image edge loss.
  • the experiment in step 4 is implemented on the Ubuntu 16.04 system using the Pytorch network framework, the graphics card is GeForce GTX 1080 Ti GPU, adaptive moment estimation is used as the optimizer, and the learning rate is set to 1 ⁇ 10 -3 , after 30 epochs Set to 1 ⁇ 10 -4 , train for a total of 300 epochs, and set the batch size to 8.
  • the present invention has the following advantages: (1) the introduction of shallow low-level features can provide more detailed information and improve the feature representation capability of the network;
  • FIG. 1 is a schematic structural diagram of an image splicing and positioning detection method based on an attention mechanism in the present invention.
  • FIG. 2 is a structural diagram of a feature adaptive layer (FAL) of an image splicing and positioning detection method based on an attention mechanism in the present invention.
  • FAL feature adaptive layer
  • FIG. 3 is a schematic diagram of a Squeeze-excitation Attention Mechanism (SEAM), an image splicing and positioning detection method based on an attention mechanism in the present invention.
  • SEAM Squeeze-excitation Attention Mechanism
  • Fig. 4 is the splicing and positioning results on some test sets of an image splicing and positioning detection method based on the attention mechanism of the present invention.
  • Fig. 5 shows the positioning results of different splicing data sets by an image splicing positioning detection method based on the attention mechanism of the present invention.
  • a method for image mosaic location detection based on an attention mechanism comprising the following steps:
  • Step 1 Use 4 benchmark image stitching datasets CASIA1.0 (461 images), CASIA2.0 (5123 images), Carvalho dataset (100 images), Columbia dataset (180 images), and two synthetic stitching data Set spliced_NIST (13575 pieces) and spliced_Dresden (35712 pieces), each data set allocates the number of training sets, verification sets and test sets in a ratio of 7:2:1.
  • Step 2 Design a two-stream multi-task learning neural network, including an edge-guided path and a label mask path.
  • the edge-guided path is an encoding and decoding path composed of U-Net, which uses the edge of the image for supervision.
  • the encoder extracts discriminative features from the input stitched image, and the decoder further processes the extracted features to obtain a pixel-by-pixel image edge prediction map.
  • the encoder of the edge-guided path consists of four consecutive convolutional modules and downsampling layers, each convolutional module consists of a convolutional layer, a batch normalization layer (Batch Normalization, BN), and a nonlinear correction unit ( Rectified Linear Unit, ReLU), in which the convolution kernel size of the convolution layer is 3 ⁇ 3, and the step size is 1. Downsampling is achieved by convolution with a convolution kernel of 4 ⁇ 4 and a stride of 2.
  • the decoder of the edge-guided path is composed of four consecutive sets of upsampling layers and convolutional modules.
  • the upsampling layer is implemented by bilinear difference, and the width and height dimensions of the feature map are doubled after each upsampling.
  • the encoder and decoder are connected by a convolutional module.
  • a convolutional layer with kernel size 1 ⁇ 1 and stride 1 is empirically used to refine the upsampled features. Nevertheless, upsampling still leads to feature loss, so it is necessary to use skip connections between U-Net contraction path and expansion path to reuse the original features and compensate for feature loss.
  • the label mask path of the proposed network has a similar structure to the edge-guided path as a whole, which is also an encoding and decoding path composed of a U-Net.
  • the groundtruth mask of the stitched region and the edges of the stitched region are used to supervise the label mask path. However, it differs from the edge-guided path in the following places:
  • FAL Feature Adaptation Layers
  • FAL is composed of Res-block, and its structure is shown in Figure 2.
  • FAL contains a convolution path and an identity path, where the convolution path consists of a convolution layer with a convolution kernel size of 1 ⁇ 1 and a step size of 1 and a ReLU layer.
  • the feature of the input FAL is y, then the output of FAL It can be expressed as
  • C 1 ⁇ 1 represents a normalized 1 ⁇ 1 convolution.
  • the filtered features are fused with the features in the label mask path in a cascaded manner.
  • the low-level features extracted from the stitched image through a shallow network are input into the label mask path and fused with the features output from the upsampling layer in the decoder.
  • Low-level features usually refer to local features of image details, such as edges, corners or gradients.
  • the red dashed path in Figure 4 represents the extraction of low-level features, which can provide more discriminative information. From left to right are 4 downsampling layers, which are 8 times, 4 times, 2 times and 1 times downsampling, respectively, with convolution kernel size/step size of 8/8, 4/4, 2/2, 1 /1. Fusion of low-level features with high-level features in the label masking path can enhance high-resolution representation.
  • SEAM Squeeze-Excitation Attention Mechanism
  • SEAM can be regarded as a simple channel attention mechanism, and its structure is shown in Figure 3.
  • the first is to perform a squeeze operation on the features obtained by convolution, that is,
  • F ex represents the excitation operation
  • represents the sigmoid activation function
  • G represents the gating mechanism implemented by ReLU
  • r represents the dimension compression ratio.
  • the weight of the Excitation output is regarded as the importance of each feature channel after feature selection, and then weighted to the previous channel by multiplication, that is
  • F scale (u c , e c ) means that u c and e c are multiplied channel by channel. So far, the recalibration of the original features in the channel dimension is completed.
  • Step 3 Design a multi-task loss function.
  • the multi-task loss function of the present invention mainly includes three aspects, the first is the loss of the label mask, the second is the loss of the edge of the mask, and the third is the loss of the image edge.
  • the overall loss function can be expressed as:
  • L total L label_mask + ⁇ 1 L label_edge + ⁇ 2 L image_edge .
  • focal loss is used as the label mask loss, namely
  • the general binary cross entropy (Binary Cross Entropy, BCE) is used as the loss function, namely
  • Q i, j represent the estimated mask edge label and the probability of being predicted as a mask edge at the pixel point (i, j), respectively.
  • the minimum square error (Minimum Square Error, MSE) is used as the loss function, namely
  • Step 4 Optimize training.
  • the experiment of the present invention adopts Pytorch network framework to realize on Ubuntu 16.04 system, and the graphics card is GeForce GTX 1080 Ti GPU.
  • the solution of the present invention adopts adaptive moment estimation (Adam) as the optimizer, and the learning rate is set to 1 ⁇ 10 -3 , which is set to 1 ⁇ 10 -4 after 30 epochs, and a total of 300 epochs are trained. Size is set to 8.
  • the model with the highest localization result on the test data is selected as the final model.
  • Step 5 Input the image to be detected into the model saved in step 4 to obtain the result of splicing and positioning.
  • the attention mechanism adopts the squeeze-excitation attention mechanism.
  • other attention mechanisms can also be used, such as the convolution block attention mechanism instead, which can also achieve better splicing and positioning results.
  • a multi-task loss function is designed, and the edge information of the image, the edge information of the stitching area and the stitching area are learned at the same time, and the positioning result of the stitching edge is improved; the low-level texture feature is extracted by using a shallow network, Enhance the feature expression ability of the proposed network; finally, use the squeeze-excitation attention mechanism to recalibrate the fusion features, so that the model pays more attention to the features that are useful for locating the stitching area, and gives them greater weight;
  • Design a two-stream network including edge-guided path and label mask path
  • a multi-task loss function to learn the edge of the image, the edge of the mask, and the label mask.
  • Features from the edge-guided path are fed into the label mask path using a Feature Adaptive Layer.
  • the channel attention mechanism is used to recalibrate the fused features in the label mask, and a larger weight is given to the features that are important for discrimination, so as to improve the expressive ability of the features.
  • the multi-task loss function designed by the present invention introduces the edge loss of the image and the loss function of the stitching area edge; 2) Low-level feature fusion of the shallow network; 3) Feature adaptive layer between the edge-guided path and the label mask 4) The introduction of the Squeeze-excitation attention mechanism.
  • the present invention introduces shallow low-level features, which can provide more detailed information and improve the feature representation capability of the network.
  • the present invention introduces the edge of the image and the edge of the stitching area as supervisory information, and designs a multi-task loss function, which can more accurately locate the stitching area.
  • the present invention introduces the Squeeze-excitation attention mechanism to recalibrate the fusion features, which can make the model pay more attention to the features that contribute more to the positioning, and obtain more accurate splicing and positioning results.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于注意力机制的图像拼接定位检测方法,一:准备图像拼接数据集,将其分为训练集、验证集和测试集三个部分;二:设计双流的多任务学习神经网络结构;三:设计多任务的损失函数;四:优化训练,得到拼接区域定位模型;五:将待检测的图像输入到步骤四中训练好的模型,得到拼接定位结果。本发明与现有技术相比的优点在于:引入浅层的低水平特征,能够提供更多的细节信息,提高网络的特征表示能力;引入图像的边缘和拼接区域的边缘作为监督信息,并设计多任务的损失函数,能够对拼接区域进行更准确的定位;引入Squeeze-excitation注意力机制,对融合特征进行重新校准,能够使模型更加关注对定位贡献较大的特征,得到更加准确的拼接定位结果。

Description

一种基于注意力机制的图像拼接定位检测方法 技术领域
本发明涉及图像拼接定位检测方法技术领域,具体是指一种基于注意力机制的图像拼接定位检测方法。
背景技术
数字图像作为一种重要的信息载体,在互联网上被广泛地分布与传播;与此同时,便捷的图像篡改操作引发了一系列图像安全问题。图像拼接是一种常用的图像篡改方法,通常将一幅图像(被称为捐赠图像)中的某个区域复制,然后经过缩放、旋转等几何操作后,粘贴到另一张图像(被称为受体图像)的某个区域中,最后对复合图像进行高斯滤波、图像增强等后处理操作,从而使拼接区域与受体图像保持一致。拼接区域边缘的后处理使得拼接定位更具挑战性。为了娱乐,人们可以采用图像拼接技术将碧海蓝天拼接到随意拍摄的照片中,伪造出自己出去旅游的美好景象。但在实际中,可能会有不法分子利用图像拼接技术来为了某种政治目的来伪造虚假宣传,或者是利用图像拼接技术来报道虚假新闻,导致不良的社会影响。因此,分析图像是否经过图像拼接操作以及定位出拼接区域具有重要的现实意义。
现有的图像拼接定位技术主要包括基于传统特征的方法和基于深度学习的方法。传统的图像拼接定位方法主要是针对拼接区域的传感器模式噪声、颜色滤波阵列的插值模式以或JPEG压缩痕迹等特征对拼接区域进行定位。但这些传统方法都是针对某种特定的图像属性,并不是对所有的拼接类型都适用。而基于深度学习的方法主要是利用大数据的数据驱动功能,对拼接区域的特征进行学习,然后对拼接区域进行定位。但现有的深度学习方法大多仅仅利用拼接图像以及对应的ground-truth mask进行学习,忽略了图像的边缘对拼接区域边缘的作用,得到定位区域边缘不是很理想。此外,现有基于深度学习的方法仅仅关注了卷积网络中较深层网络的高水平特征,而忽略了浅层网络的低水平特征,从而导致拼接定位的准确性不高。
现有技术的缺点1:现有技术仅利用了卷积网络深层的特征,而没有利用浅层网络输出的低水平特征,使得拼接定位结果有待进一步提升。浅层网络输出的低水平特征包含了图像的局部特征,包含了一些图像的细节信息,这些信息可以提高网络的特征表达能力,能够进一步地改善定位效果。
现有技术的缺点2:现有技术仅仅利用了拼接图像,而没有利用图像的边缘信息和拼接区域的边缘信息。图像的边缘信息和拼接区域的边缘信息对拼接区域的边缘能够起到引导作用,提高拼接边缘定位的准确率。
现有技术的缺点3:现有技术对特征进行简单融合而没有经过注意力机制重新校准,使得输出特征判别性较差,定位结果有待进一步提升。
发明内容
本发明要解决的技术问题是克服以上技术缺陷,提供一种基于注意力机制的图像拼接定位检测方法,设计多任务的损失函数,同时学习图像的边缘信息、拼接区域的边缘信息以及拼接区域,提高对拼接边缘的定位结果;利用浅层的网络提取低水平的纹理特征,增强所提网络的特征表达能力;最后,利用squeeze-excitation注意力机制对融合特征进行重新校准,使模型更加关注对定位拼接区域有用的特征,对其赋予更大的权重。
为解决上述技术问题,本发明提供的技术方案为:一种基于注意力机制的图像拼接定位检测方法,包括以下步骤:
步骤一:准备图像拼接数据集,将其分为训练集、验证集和测试集三个部分;
步骤二:设计双流的多任务学习神经网络结构;
步骤三:设计多任务的损失函数;
步骤四:优化训练,得到拼接区域定位模型;
步骤五:将待检测的图像输入到步骤四中训练好的模型,得到拼接定位结果。
优选的,步骤一中用4个基准的图像拼接数据集CASIA1.0,461张,CASIA2.0,5123张,Carvalho数据集100张,Columbia数据集180张,以及两个合成拼接数据集spliced_NIST13575张和spliced_Dresden 35712张,每个数据集按7:2:1的比例来分配训练集、验证集和测试集的数量。
优选的,步骤二中包括边缘引导路径和标签掩码路径,其中边缘引导路径是由U-Net构成的一条编码解码路径,采用图像的边缘进行监督,标签掩码路径由一条U-Net构成的一条编码解码路径,拼接区域的真实Groundtruth mask和拼接区域的边缘用来监督标签掩模路径。
优选的,步骤三中多任务损失函数包含三个方面,第一个是label mask损失,第二是mask边缘损失,第三是图像边缘损失。
优选的,步骤四中实验采用Pytorch网络框架在Ubuntu 16.04系统上实现,显卡为GeForce GTX 1080 Ti GPU,采用自适应矩估计作为优化器,学习率设置为1×10 -3,在30个epoch后设置为1×10 -4,总共训练300个epoch,批尺寸设置为8。
本发明与现有技术相比的优点在于:(1)引入浅层的低水平特征,能够提供更多的细节信息,提高网络的特征表示能力;
(2)引入图像的边缘和拼接区域的边缘作为监督信息,并设计多任务的损失函数,能够 对拼接区域进行更准确的定位;
(3)引入Squeeze-excitation注意力机制,对融合特征进行重新校准,能够使模型更加关注对定位贡献较大的特征,得到更加准确的拼接定位结果。
附图说明
图1是本发明一种基于注意力机制的图像拼接定位检测方法的结构示意图。
图2是本发明一种基于注意力机制的图像拼接定位检测方法特征自适应层(FAL)结构图。
图3是本发明一种基于注意力机制的图像拼接定位检测方法Squeeze-excitation注意力机制(SEAM)示意图。
图4是本发明一种基于注意力机制的图像拼接定位检测方法部分测试集上的拼接定位结果。
图5是本发明一种基于注意力机制的图像拼接定位检测方法对不同拼接数据集的定位结果。
具体实施方式
下面结合附图对本发明做进一步的详细说明。
一种基于注意力机制的图像拼接定位检测方法,包括以下步骤:
步骤一:采用4个基准的图像拼接数据集CASIA1.0(461张),CASIA2.0(5123张),Carvalho数据集(100张),Columbia数据集(180张),以及两个合成拼接数据集spliced_NIST(13575张)和spliced_Dresden(35712张),每个数据集按7:2:1的比例来分配训练集、验证集和测试集的数量。
步骤二:设计双流的多任务学习神经网络,包括边缘引导路径和标签掩码路径。其中边缘引导路径是由U-Net构成的一条编码解码路径,采用图像的边缘进行监督。编码器从输入的拼接图像中提取判别性的特征,解码器对提取的特征进行进一步的处理,得到逐像素的图像边缘预测图。边缘引导路径的编码器由四组连续的卷积模块和下采样层组成,每一个卷积模块由一个卷积层,一个批正则化层(Batch Normalization,BN),和一个非线性修正单元(Rectified Linear Unit,ReLU)组成,其中卷积层的卷积核大小都是3×3,步长为1。下采样是由卷积核为4×4,步长为2的卷积实现。边缘引导路径的解码器是由四组连续的上采样层和卷积模块组成的。上采样层通过双线性差值实现,特征图的宽和高尺寸在每次上采样后都翻倍。编码器与解码器之间由一个卷积模块连接。最后,一个卷积核大小为1×1,步长为1的卷积层被经验性地用来细化上采样特征。尽管如此,上采样还是会导致特征损失,因此需要采用U-Net收缩路径与扩张路径之间的跳跃连接来对初始特征重利用并且弥补特征损失。
所提网络的标签掩模路径整体上边缘引导路径结构相似,也是由一条U-Net构成的一条编码解码路径。拼接区域的真实Groundtruth mask和拼接区域的边缘用来监督标签掩模路径。但是与边缘引导路径在以下几个地方有区别:
1)边缘引导路径中的特征通过特征适应层(Feature Adaption Layers,FALs)滤波后输入到标签掩码路径,与标签掩码路径中的特征融合。FAL由Res-block组成,结构如附图2所示。FAL包含一条卷积路径和一条恒等路径,其中卷积路径由一个卷积核大小为1×1,步长为1的卷积层和一个ReLU层组成。假设输入FAL的特征是y,那么FAL的输出
Figure PCTCN2022138200-appb-000001
可以表示为
Figure PCTCN2022138200-appb-000002
其中
Figure PCTCN2022138200-appb-000003
表示逐像素相加,C 1×1表示归一化的1×1的卷积。为了减少融合时的损失,采用级联的方式将滤波后的特征与标签掩码路径中的特征融合。
2)从拼接图像中通过浅层网络提取的低层次特征输入到标签掩模路径中,与解码器中的上采样层中输出的特征融合。低层次特征通常指的是图像细节的局部特征,比如边,角或者梯度等。图4中的红色虚线路径表示低层次特征的提取,低层次特征能提供更多的判别信息。从左到右是4个下采样层,分别是8倍,4倍,2倍和1倍下采样,分别用卷积核大小/步长为8/8,4/4,2/2,1/1。低层次特征与标签掩模路径中的高层次特征融合能够增强高分辨率的表达。
3)直接融合的特征对修复定位很粗糙,因此所提网络将融合特征输入到挤压-激励注意力模块(Squeeze-Excitation Attention Mechanism,SEAM)。SEAM可以看作是一种简单的通道注意力机制,结构如附图3所示。用
Figure PCTCN2022138200-appb-000004
表示SEAM的输入,通过一系列卷积等一般操作后得到一个通道数为C的特征
Figure PCTCN2022138200-appb-000005
其中
Figure PCTCN2022138200-appb-000006
*表示卷积,V=[v 1,v 2,...,v C]表示卷积核。
与传统的CNN不同,接下来采用三个操作重新标定得到的特征。首先是对卷积得到的特征进行squeeze操作,即
Figure PCTCN2022138200-appb-000007
其中F sq表示挤压操作,得到通道级的全局特征z=[z 1,z 2,...,z C],顺着空间维度进行特征压缩,将每个二维特征通道变成一个实数z c,这个实数在某种程度上具有全局的感受野,并且输出的维度与输入的特征通道数相匹配。
然后进行Excitation操作,学习各个通道之间的关系,得到不同通道的权重,即
e=F ex(z,W)=σ(G(z,W))=σ(W 2ReLU(W 1z))      (3)
其中F ex表示激励操作,σ表示sigmoid激活函数,G表示由ReLU实现的门控机制,
Figure PCTCN2022138200-appb-000008
r表示维度压缩比。采用一个类似于循环神经网络中门的机制,通过参数W来为每个特征通道生成权重。
最后,将Excitation输出的权重看作是经过特征选择后的每个特征通道的重要性,然后通过乘法逐通道加权到先前的通道,即
Figure PCTCN2022138200-appb-000009
其中F scale(u c,e c)表示u c和e c逐通道相乘。至此,完成在通道维度上对原始特征的重新标定。
步骤三:设计多任务损失函数。本发明的多任务损失函数主要包含三个方面,第一个是label mask的损失,第二是mask边缘的损失,第三是图像边缘损失。总的损失函数可以表示为:
L total=L label_mask1L label_edge2L image_edge.       (5)
为了解决正负样本之间分布的不均匀性以及样本中区分难易程度的不一致性,采用focal loss作为label mask损失,即
Figure PCTCN2022138200-appb-000010
其中
Figure PCTCN2022138200-appb-000011
和P i,j分别表示在像素点(i,j)处的估计标签和预测为拼接像素的概率,α用来平衡正负样本的比例,γ用来平衡难易样本的比例,在本实验中经验性地设置α=0.25,γ=2。
对于mask edge采用一般的二值交叉熵(Binary Cross Entropy,BCE)作为损失函数,即
Figure PCTCN2022138200-appb-000012
其中
Figure PCTCN2022138200-appb-000013
和Q i,j分别表示在像素点(i,j)处的估计的mask edge标签和预测为mask edge的概率。
对于图像边缘,采用最小均方误差(Minimum Square Error,MSE)作为损失函数,即
Figure PCTCN2022138200-appb-000014
其中S i,j
Figure PCTCN2022138200-appb-000015
分别表示图像边缘的真实值和估计值。
步骤四:优化训练。本发明的实验采用Pytorch网络框架在Ubuntu 16.04系统上实现,显卡为GeForce GTX 1080 Ti GPU。本发明的方案采用自适应矩估计(adaptive moment estimation,Adam)作为优化器,学习率设置为1×10 -3,在30个epoch后设置为1×10 -4,总共训练300个epoch,批尺寸设置为8。损失函数的调节系数λ 1,λ 2对最终拼接定位结果影响不是很大,且当λ 1=λ 2=1 时,获得最好的检测结果。因此,实验设置为λ 1=λ 2=1。最后选择对测试数据有最高的定位结果的模型作为最终模型。
步骤五:将待检测的图像输入到步骤四中保存的模型,得到拼接定位结果。
本发明在几个常用的图像拼接数据集上验证,实验结果证明所提方案可行。采用F1-socre作为判别标准,对不同拼接数据集的定位结果如附图5所示,部分测试集上的拼接定位结果如附图4所示。
注意力机制采用squeeze-excitation注意力机制,实际中也可采用其他的注意力机制,比如卷积块注意力机制代替,也能达到比较好的拼接定位结果。
本发明在具体实施时,设计多任务的损失函数,同时学习图像的边缘信息、拼接区域的边缘信息以及拼接区域,提高对拼接边缘的定位结果;利用浅层的网络提取低水平的纹理特征,增强所提网络的特征表达能力;最后,利用squeeze-excitation注意力机制对融合特征进行重新校准,使模型更加关注对定位拼接区域有用的特征,对其赋予更大的权重;
设计一条双流网络(包括边缘引导路径和标签掩模路径),采用多任务的损失函数,对图像的边缘、mask边缘以及label mask进行学习。采用特征自适应层(Feature Adaptive Layer)将边缘引导路径中的特征输入到标签掩模路径。对标签掩模中的融合特征利用通道注意力机制进行重新校准,对判别具有重要性的特征赋予较大的权重,从而提高特征的表达能力。
1)本发明设计的多任务损失函数引入图像的边缘损失和拼接区域边缘的损失函数;2)浅层网络的低水平特征融合;3)边缘引导路径和标签掩模之间的特征自适应层的引入;4)Squeeze-excitation注意力机制的引入。
(1)本发明引入浅层的低水平特征,能够提供更多的细节信息,提高网络的特征表示能力。
(2)本发明引入图像的边缘和拼接区域的边缘作为监督信息,并设计多任务的损失函数,能够对拼接区域进行更准确的定位。
(3)本发明引入Squeeze-excitation注意力机制,对融合特征进行重新校准,能够使模型更加关注对定位贡献较大的特征,得到更加准确的拼接定位结果。
以上对本发明及其实施方式进行了描述,这种描述没有限制性,附图中所示的也只是本发明的实施方式之一,实际的结构并不局限于此。总而言之如果本领域的普通技术人员受其启示,在不脱离本发明创造宗旨的情况下,不经创造性的设计出与该技术方案相似的结构方式及实施例,均应属于本发明的保护范围。

Claims (5)

  1. 一种基于注意力机制的图像拼接定位检测方法,其特征在于:包括以下步骤:
    步骤一:准备图像拼接数据集,将其分为训练集、验证集和测试集三个部分;
    步骤二:设计双流的多任务学习神经网络结构;
    步骤三:设计多任务的损失函数;
    步骤四:优化训练,得到拼接区域定位模型;
    步骤五:将待检测的图像输入到步骤四中训练好的模型,得到拼接定位结果。
  2. 根据权利要求1所述的一种基于注意力机制的图像拼接定位检测方法,其特征在于:步骤一中用4个基准的图像拼接数据集CASIA1.0,461张,CASIA2.0,5123张,Carvalho数据集100张,Columbia数据集180张,以及两个合成拼接数据集spliced_NIST 13575张和spliced_Dresden 35712张,每个数据集按7:2:1的比例来分配训练集、验证集和测试集的数量。
  3. 根据权利要求1所述的一种基于注意力机制的图像拼接定位检测方法,其特征在于:步骤二中包括边缘引导路径和标签掩码路径,其中边缘引导路径是由U-Net构成的一条编码解码路径,采用图像的边缘进行监督,标签掩码路径由一条U-Net构成的一条编码解码路径,拼接区域的真实Groundtruth mask和拼接区域的边缘用来监督标签掩模路径。
  4. 根据权利要求1所述的一种基于注意力机制的图像拼接定位检测方法,其特征在于:步骤三中多任务损失函数包含三个方面,第一个是label mask的损失,第二是mask边缘的损失,第三是图像边缘损失。
  5. 根据权利要求1所述的一种基于注意力机制的图像拼接定位检测方法,其特征在于:步骤四中实验采用Pytorch网络框架在Ubuntu 16.04系统上实现,显卡为GeForce GTX 1080 Ti GPU,采用自适应矩估计作为优化器,学习率设置为1×10 -3,在30个epoch后设置为1×10 -4,总共训练300个epoch,批尺寸设置为8。
PCT/CN2022/138200 2021-12-15 2022-12-09 一种基于注意力机制的图像拼接定位检测方法 WO2023109709A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111532297.2A CN114418840A (zh) 2021-12-15 2021-12-15 一种基于注意力机制的图像拼接定位检测方法
CN202111532297.2 2021-12-15

Publications (1)

Publication Number Publication Date
WO2023109709A1 true WO2023109709A1 (zh) 2023-06-22

Family

ID=81268034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138200 WO2023109709A1 (zh) 2021-12-15 2022-12-09 一种基于注意力机制的图像拼接定位检测方法

Country Status (2)

Country Link
CN (1) CN114418840A (zh)
WO (1) WO2023109709A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912183A (zh) * 2023-06-30 2023-10-20 哈尔滨工业大学 一种基于边缘引导和对比损失的深度修复图像的篡改定位方法及系统
CN117291809A (zh) * 2023-11-27 2023-12-26 山东大学 一种基于深度学习的集成电路图像拼接方法及系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418840A (zh) * 2021-12-15 2022-04-29 深圳先进技术研究院 一种基于注意力机制的图像拼接定位检测方法
CN114764858B (zh) * 2022-06-15 2022-11-01 深圳大学 一种复制粘贴图像识别方法、装置、计算机设备及存储介质
CN116912184B (zh) * 2023-06-30 2024-02-23 哈尔滨工业大学 一种基于篡改区域分离和区域约束损失的弱监督深度修复图像篡改定位方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493927A (zh) * 2009-02-27 2009-07-29 西北工业大学 基于边缘方向特征的图像可信度检测方法
EP3171297A1 (en) * 2015-11-18 2017-05-24 CentraleSupélec Joint boundary detection image segmentation and object recognition using deep learning
CN110414670A (zh) * 2019-07-03 2019-11-05 南京信息工程大学 一种基于全卷积神经网络的图像拼接篡改定位方法
CN111080629A (zh) * 2019-12-20 2020-04-28 河北工业大学 一种图像拼接篡改的检测方法
CN112465700A (zh) * 2020-11-26 2021-03-09 北京航空航天大学 一种基于深度聚类的图像拼接定位装置及方法
CN114418840A (zh) * 2021-12-15 2022-04-29 深圳先进技术研究院 一种基于注意力机制的图像拼接定位检测方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493927A (zh) * 2009-02-27 2009-07-29 西北工业大学 基于边缘方向特征的图像可信度检测方法
EP3171297A1 (en) * 2015-11-18 2017-05-24 CentraleSupélec Joint boundary detection image segmentation and object recognition using deep learning
CN110414670A (zh) * 2019-07-03 2019-11-05 南京信息工程大学 一种基于全卷积神经网络的图像拼接篡改定位方法
CN111080629A (zh) * 2019-12-20 2020-04-28 河北工业大学 一种图像拼接篡改的检测方法
CN112465700A (zh) * 2020-11-26 2021-03-09 北京航空航天大学 一种基于深度聚类的图像拼接定位装置及方法
CN114418840A (zh) * 2021-12-15 2022-04-29 深圳先进技术研究院 一种基于注意力机制的图像拼接定位检测方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912183A (zh) * 2023-06-30 2023-10-20 哈尔滨工业大学 一种基于边缘引导和对比损失的深度修复图像的篡改定位方法及系统
CN116912183B (zh) * 2023-06-30 2024-02-20 哈尔滨工业大学 一种基于边缘引导和对比损失的深度修复图像的篡改定位方法及系统
CN117291809A (zh) * 2023-11-27 2023-12-26 山东大学 一种基于深度学习的集成电路图像拼接方法及系统
CN117291809B (zh) * 2023-11-27 2024-03-15 山东大学 一种基于深度学习的集成电路图像拼接方法及系统

Also Published As

Publication number Publication date
CN114418840A (zh) 2022-04-29

Similar Documents

Publication Publication Date Title
WO2023109709A1 (zh) 一种基于注意力机制的图像拼接定位检测方法
CN111080629A (zh) 一种图像拼接篡改的检测方法
JP7246104B2 (ja) テキスト行識別に基づくナンバープレート識別方法
CN108960404B (zh) 一种基于图像的人群计数方法及设备
Abbas et al. Lightweight deep learning model for detection of copy-move image forgery with post-processed attacks
KR20180065889A (ko) 타겟의 검측 방법 및 장치
CN110766020A (zh) 一种面向多语种自然场景文本检测与识别的系统及方法
Johnson et al. Sparse codes as alpha matte
CN109635634A (zh) 一种基于随机线性插值的行人再识别数据增强方法
Liu et al. Overview of image inpainting and forensic technology
Li et al. Target-guided feature super-resolution for vehicle detection in remote sensing images
CN116681636A (zh) 基于卷积神经网络的轻量化红外与可见光图像融合方法
CN112651333A (zh) 静默活体检测方法、装置、终端设备和存储介质
CN113807237B (zh) 活体检测模型的训练、活体检测方法、计算机设备及介质
Xu et al. COCO-Net: A dual-supervised network with unified ROI-loss for low-resolution ship detection from optical satellite image sequences
Hong et al. Near-infrared image guided reflection removal
CN115661611A (zh) 一种基于改进Yolov5网络的红外小目标检测方法
Hu et al. Vehicle color recognition based on smooth modulation neural network with multi-scale feature fusion
CN117789293A (zh) 基于多特征分离的行人重识别方法、系统与计算机可读介质
Zhang et al. Deep joint neural model for single image haze removal and color correction
Conrad et al. Two-stage seamless text erasing on real-world scene images
CN110490053B (zh) 一种基于三目摄像头深度估计的人脸属性识别方法
Tian et al. Deformable convolutional network constrained by contrastive learning for underwater image enhancement
Yuan et al. Structure flow-guided network for real depth super-resolution
Zhao et al. End‐to‐End Retinex‐Based Illumination Attention Low‐Light Enhancement Network for Autonomous Driving at Night

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22906456

Country of ref document: EP

Kind code of ref document: A1