WO2024032158A1 - 基于混合频域通道注意力的深度图像水印方法 - Google Patents

基于混合频域通道注意力的深度图像水印方法 Download PDF

Info

Publication number
WO2024032158A1
WO2024032158A1 PCT/CN2023/101599 CN2023101599W WO2024032158A1 WO 2024032158 A1 WO2024032158 A1 WO 2024032158A1 CN 2023101599 W CN2023101599 W CN 2023101599W WO 2024032158 A1 WO2024032158 A1 WO 2024032158A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
watermark information
watermark
encoder
noise
Prior art date
Application number
PCT/CN2023/101599
Other languages
English (en)
French (fr)
Inventor
张强
王宾
谭钧
陈蓉蓉
魏小鹏
Original Assignee
大连大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大连大学 filed Critical 大连大学
Priority to US18/453,846 priority Critical patent/US20240054594A1/en
Publication of WO2024032158A1 publication Critical patent/WO2024032158A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the invention relates to the fields of artificial neural networks and digital image watermarking, and specifically relates to a deep image watermarking method based on hybrid frequency domain channel attention.
  • the combination of deep neural networks and digital image watermarking algorithms has become a popular direction in the field of information hiding.
  • the combination of the two can not only protect the copyright information of images, but also
  • the trained watermark algorithm model can be applied to most image scenarios.
  • the neural network can well fit the embedding and extraction of watermark information, and allows the three parts of watermark embedding, image noise and watermark extraction that are originally separate to participate in the training of the neural network.
  • the robustness and invisibility have been improved.
  • the selection of channel features plays a certain role in image watermarking.
  • Selecting frequency domain components suitable for embedding watermarks as the weight of channel features in the frequency domain channel attention module can improve the performance of the watermark model.
  • the current watermark image has poor watermark extraction effect after JPEG compression, and the watermark image quality is poor.
  • the present invention provides a deep image watermarking method based on hybrid frequency domain channel attention, which combines an end-to-end deep watermark model with frequency
  • the combination of domain channel attention has expanded the application scope of deep neural networks in the field of image watermarking, and a new encoder structure has been designed with the help of the frequency domain channel attention module, which ultimately resulted in higher quality watermark images and better decoding effects. watermark information.
  • a deep image watermarking method based on hybrid frequency domain channel attention including the following steps:
  • Step 1 The watermark information processor generates a watermark information feature map
  • Step 2 The encoder generates a watermark image from the carrier image and watermark information feature map
  • Step 3 The noise layer takes the watermark image as input and generates a noise image through simulated differentiable noise
  • Step 4 The decoder downsamples the above noise image to restore the watermark information
  • Step 5 The adversarial discriminator classifies the carrier image and the watermarked image to enable the encoder to generate high-quality watermarked images.
  • step 1 is specifically: the watermark information processor takes the watermark information as input, diffuses the watermark information to each bit of information through the fully connected layer, and then transforms the diffused watermark information from one-dimensional to two-dimensional features. In graph form, the watermark information feature map is then generated through the diffusion convolution layer and attention module.
  • step 2 is specifically: the encoder takes the carrier image and the watermark information feature map as input, and generates the watermark image through the ConvBNReLU convolution block, the hybrid frequency domain channel attention module and the skip connection.
  • the hybrid frequency domain channel attention module in the encoder consists of two branches, one of which consists of multiple SENet attention modules.
  • the SENet attention module uses a global average pooling layer in the channel compression process, that is The lowest frequency component in the two-dimensional discrete cosine transform is used as the weight for assigning channel features; the other branch consists of an FCA attention module.
  • the FCA attention module divides 64 frequency domains according to the 8 ⁇ 8 block method of the JPEG compression principle.
  • the feature tensor generated by the FCA attention module branch and the SENet attention module branch is then used in the channel dimension Perform splicing and use a ConvBNReLU convolution module for feature fusion.
  • step 4 is specifically: the decoder takes the noise image as input, and uses the ConvBNReLU convolution module and the SENet attention module to downsample and recover the watermark information.
  • the loss function for training the encoder includes and and To assist the encoder in generating high-quality watermark images,
  • I CO is the carrier image
  • I EN is the watermark image
  • E represents the encoder
  • ⁇ E is the parameter of the encoder E
  • M EN is the watermark information feature map
  • A represents the adversarial discriminator
  • ⁇ A is the adversarial discriminator A. parameter.
  • M is the original watermark information
  • M D is the decoded and restored watermark information
  • D represents the decoder
  • ⁇ D is the parameter of the decoder D
  • I NO is the noise image.
  • L A log(1-A( ⁇ A ,E( ⁇ E ,I CO ,M EN )))+log(A( ⁇ A ,I CO ))
  • A represents the adversarial discriminator
  • ⁇ A is the parameter of the adversarial discriminator A
  • E represents the encoder
  • ⁇ E is the parameter of the encoder E
  • I CO is the carrier image
  • M EN is the watermark information feature map.
  • Channel attention is introduced to extract features from the carrier image, using multiple frequency domain components on the channel to reduce the amount of lost information in the encoding process, and by independently selecting 16 low-frequency components as the weight parameters of the channel attention, compared with The mid-frequency and high-frequency components are more robust to JPEG compression;
  • a dual-branch structure is designed.
  • the two branches use different attention to learn feature maps.
  • the feature maps generated by the two branches are spliced in the channel dimension and then feature fusion is performed through the convolution layer, so that the quality of the generated watermark image is improved. A big improvement.
  • Figure 1 is a network model architecture diagram of the overall method of the present invention
  • Figure 2 is a schematic diagram of frequency domain channel attention
  • Figure 3 is a schematic diagram of the selection of frequency domain components
  • Figure 4 shows the test results after special training for JPEG compression with noise
  • Figure 5 shows the experimental results under a variety of different noise tests after mixed noise training.
  • Figure 1 shows the overall method network model architecture of the present invention. Deep image watermarking method based on hybrid frequency domain channel attention, including:
  • the watermark information processor takes the watermark information as input and diffuses the watermark information to each bit of information through the fully connected layer, and then transforms the diffused watermark information from one dimension to a two-dimensional feature map form, and then passes the diffusion convolution layer and attention module to generate watermark information feature maps.
  • the encoder takes the carrier image and watermark information feature map as input and generates the watermark image through the ConvBNReLU convolution block and the hybrid frequency domain channel attention module and skip connection.
  • the hybrid frequency domain channel attention module consists of two branches. One branch contains multiple SE attention modules. The SE attention module uses the lowest frequency component in the discrete cosine transform domain as the weight parameter; the other branch selects 16 low-frequency components in a zigzag order based on 8 ⁇ 8 blocks. as a weight parameter.
  • the hybrid frequency domain channel attention module in the encoder is designed with two branches.
  • One branch is composed of multiple SENet attention modules, and the other branch is composed of an FCA attention module.
  • SENet uses a global average pooling layer in the channel compression process, that is, the lowest frequency component in the two-dimensional discrete cosine transform is used as the weight for assigning channel features, while the FCA attention module performs a global average pooling layer based on the above principles.
  • multiple two-dimensional discrete cosine transform components can be selected.
  • the present invention also divides 64 frequency domain components according to the 8 ⁇ 8 block method of JPEG compression principle, and selects 16 components starting from the lowest frequency component in a zigzag manner.
  • the low-frequency components are used as the compressed weights of the FCA attention module.
  • the feature tensors generated by the FCA attention module branch and the SENet attention module branch are then spliced in the channel dimension, and a ConvBNReLU convolution module is used for feature fusion.
  • DCT N in Figure 2 refers to the block discrete cosine transform; Freq N refers to the frequency component. See Figure 3 for a schematic diagram of the selection of frequency domain components.
  • the noise layer takes the watermark image as input and simulates differentiable noise to generate a noisy image; during the training process of the model, for each batch of input watermark images, the noise layer randomly selects one from the set noise for distortion. Simulate the noise environment in real scenarios.
  • the decoder takes the noise image as input and performs downsampling to recover the watermark information through the ConvBNReLU convolution block and SENet attention module.
  • the adversarial discriminator classifies carrier images and watermark images to help the encoder generate higher quality watermark images.
  • the adversarial discriminator consists of multiple ConvBNReLU modules with a convolution kernel size of 3 ⁇ 3 and a global average pooling layer.
  • ConvBNReLU convolution module
  • convolution module ConvBNReLU consisting of a convolution layer with a convolution kernel size of 3 ⁇ 3, a batch normalization layer, and an activation function ReLU, and its size is expanded to C through several diffusion convolution layers. ⁇ H ⁇ W.
  • the encoder E with parameter ⁇ E takes an RGB color image of size 3 ⁇ H ⁇ W, that is, the carrier image I CO and the watermark information feature map M EN as input, and outputs an encoded image of size 3 ⁇ H ⁇ W, that is, Watermark image I EN .
  • the encoder uses a hybrid frequency channel attention block, including multiple SE channel attention modules and an FCA frequency domain channel attention module.
  • the principle of selecting multi-frequency components by the FCA attention module is:
  • x 2d is used as the input of the discrete cosine transform
  • H is the height of x 2d
  • W is the width of x 2d
  • the entire encoder consists of multiple ConvBNReLU convolution blocks with a convolution kernel size of 3 ⁇ 3, a mixed frequency channel attention module, and a convolution layer with a convolution kernel size of 1 ⁇ 1.
  • it first amplifies the carrier image through the ConvBNReLU convolution block with a convolution kernel size of 3 ⁇ 3, then uses the proposed mixed frequency channel attention module to ensure that the feature map size remains unchanged, and then uses a convolution kernel size of 3 ⁇
  • the ConvBNReLU convolution block of 3 concentrates the feature maps obtained by the attention module.
  • the second step is to input the watermark information feature map obtained from the watermark information processor and the previously output carrier image and feature map obtained by the mixed frequency channel attention module into the ConvBNReLU convolution block with a convolution kernel size of 3 ⁇ 3 for feature fusion.
  • the third step is to splice the fused feature map and the carrier image fed by the skip connection into a new feature map, and send it to a convolution layer with a convolution kernel size of 1 ⁇ 1 to obtain the encoded image I EN .
  • the encoder is trained to minimize the L2 distance between I CO and I EN by updating the parameters ⁇ E :
  • the robustness of the entire model is provided by the noise layer.
  • the noise in the noise layer is selected from the specified noise pool, which takes the encoded image I EN as input and outputs the noise image I NO of the same size.
  • the noise image I NO takes the encoded image I EN as input and outputs the noise image I NO of the same size.
  • one is randomly selected from the set noise for distortion to simulate the noise environment in real scenes.
  • the task of the decoder D with parameters ⁇ D is to recover the watermark information M D of length L from the noise image I NO . This part determines the ability of the entire model to extract watermarks.
  • the noise image I NO is input to the ConvBNReLU layer with a convolution kernel size of 3 ⁇ 3, and the obtained feature map is downsampled through multiple SE attention modules.
  • the multi-channel tensor is converted into a single-channel tensor through a convolution layer with a convolution kernel size of 3 ⁇ 3, and the shape of the single-channel tensor is changed to obtain the decoded watermark information MD .
  • the goal of decoder training is to minimize the L2 distance between the original watermark information M and M D by updating the parameter ⁇ D :
  • the loss function LD plays an important role in the total loss The largest proportion in the function.
  • the adversarial discriminator A consists of multiple ConvBNReLU modules with a convolution kernel size of 3 ⁇ 3 and a global average pooling layer. Under the influence of the adversarial network, the encoder will deceive the opponent as much as possible, so that the opponent discriminator cannot make correct judgments on I CO and I EN , and update the parameters ⁇ E to minimize the loss function L E2 to improve the encoder Encoding quality:
  • the discriminator with parameter ⁇ A needs to distinguish I CO and I EN as a binary classifier.
  • L A is the loss function for the adversarial discriminator.
  • ⁇ E , ⁇ D and ⁇ A are the weight parameters of each loss function respectively, which are set to 1, 10 and 0.0001 during training.
  • the loss function is divided into two parts.
  • One part is the loss function for the encoder and the decoder.
  • LD and in and Used to assist the encoder in generating high-quality watermark images Use L2 loss to make the carrier image and watermark image as visually similar as possible,
  • the loss generated by the adversarial discriminator assists the encoder; the other part is the loss function L A used to train the discriminator.
  • 10,000 images can be randomly selected from the ImageNet image data set as the training set of the model, and then 5,000 images can be randomly selected from the COCO image data set as the verification set and 5,000 images as the test set.
  • the data set is preprocessed and cropped to a size of 128 ⁇ 128 before being input to the model for training, the batch size is set to 16, and the training rounds are set to 150.
  • the optimization algorithm during the training process choose dynamic Adam and set the learning rate to 0.001.
  • the embedding strength of the watermark information is set to 1.
  • PSNR and SSIM are used to calculate the similarity between the carrier image and the watermark image to represent the imperceptibility of the watermark algorithm, and the bit error rate between the watermark information and the watermark information recovered by the decoder is used to represent Robustness of watermarking algorithm.
  • the test results after special training for JPEG compression with noise are shown in Figure 4.
  • the single noise model means that the noise layer only includes one kind of noise, and the trained watermark model only has strong robustness to this noise.
  • the noise layer is set to noiseless, simulated JPEG-Mask and True JPEG compression.
  • the reason for this choice is that real JPEG compression is non-differentiable noise, and the feedback model parameters cannot be added to the model training.
  • the simulated JPEG-Mask is just a manually set JPEG compression template and cannot achieve the effect of real JPEG compression. Therefore, noiseless, JPEG-Mask and real JPEG compression are selected for mixed training to simulate JPEG compression in the real environment to the greatest extent.
  • the intensity factor of JPEG compression is set to 50.
  • the experimental results under various noise tests after mixed noise training are shown in Figure 5.
  • the mixed noise model sets a variety of noises in the noise layer, so that the trained model can achieve better robustness to most noises.
  • This embodiment provides a setting for training the mixed noise model.
  • the preset number of training rounds is 150. After the training is completed, several training rounds corresponding to the minimum values are selected from the recorded training log according to the total loss of the verification set as the weights to be imported into the model for testing.
  • Test method What should be emphasized during the testing process is that the watermark images during the training process are different from those during the testing process.
  • the watermark image generated by the encoder is directly input into the noise layer to participate in the entire training.
  • the weight parameters of the watermark information processor, encoder, and decoder are fixed.
  • the watermark image generated by the encoder and the carrier The difference value I diff of the image represents the watermark information.
  • Table 3 shows the results of comparing the quality of encoded images after a single training for each type of noise, while adjusting the intensity factor to bring the bit error rate close to 0%.
  • Table 4 shows the test results under different quality factors and different intensity factors after special training for noise for JPEG compression.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

一种基于混合频域通道注意力的深度图像水印方法,涉及人工神经网络和数字图像水印领域;该方法包括:步骤1:水印信息处理器生成水印信息特征图;步骤2:编码器将载体图像和水印信息特征图生成水印图像;步骤3:噪声层把水印图像作为输入,经过模拟的可微噪声生成噪声图像;步骤4:解码器对上述噪声图像进行降采样恢复水印信息;步骤5:对抗判别器对载体图像和水印图像进行分类以使编码器生成高质量水印图像。该方法将端到端的深度水印模型与频域通道注意力相结合,扩大了深度神经网络在图像水印领域的应用范围,并且借助频域通道注意力模块设计了新的编码器结构,最终得到了质量较高的水印图像以及解码效果较好的水印信息。

Description

基于混合频域通道注意力的深度图像水印方法
本申请要求于2022年08月10日提交中国专利局、申请号为202210955381.3、发明名称为“基于混合频域通道注意力的深度图像水印方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及人工神经网络和数字图像水印领域,具体涉及一种基于混合频域通道注意力的深度图像水印方法。
背景技术
近年来随着深度神经网络在计算机视觉任务上的大获成功,深度神经网络与数字图像水印算法的结合成为了信息隐藏领域一个比较热门的方向,二者的结合不仅能够保护图像的版权信息,并且由于神经网络强大的学习能力,训练好的水印算法模型可以适用于大部分的图像场景。除此之外神经网络能够很好的对水印信息的嵌入和提取进行拟合,并且使得本来是分开的水印嵌入、图像噪声和水印提取三个部分能够在神经网络的训练中都进行参与,相较于传统方法在鲁棒性和不可见性上都得到了提升。通道特征的选择对于图像水印有一定的作用,选择适用于嵌入水印的频域分量作为频域通道注意力模块中通道特征的权重能提高水印模型的性能。而目前的水印图像在经过JPEG压缩后提取水印效果不好,水印图像质量较差。
发明内容
为了解决水印图像在经过JPEG压缩后提取水印效果不好和水印图像质量较差的问题,本发明提供一种基于混合频域通道注意力的深度图像水印方法,将端到端的深度水印模型与频域通道注意力相结合,扩大了深度神经网络在图像水印领域的应用范围,并且借助频域通道注意力模块设计了新的编码器结构,最终得到了质量较高的水印图像以及解码效果较好的水印信息。
本发明为解决其技术问题所采用的技术方案是:
一种基于混合频域通道注意力的深度图像水印方法,包括如下步骤:
步骤1:水印信息处理器生成水印信息特征图;
步骤2:编码器将载体图像和水印信息特征图生成水印图像;
步骤3:噪声层把水印图像作为输入,经过模拟的可微噪声生成噪声图像;
步骤4:解码器对上述噪声图像进行降采样恢复水印信息;
步骤5:对抗判别器对载体图像和水印图像进行分类以使编码器生成高质量水印图像。
进一步的,所述步骤1具体为:水印信息处理器以水印信息作为输入,经过全连接层将水印信息扩散到每位信息上,再将扩散后的水印信息从一维变换为二维的特征图形式,然后通过扩散卷积层和注意力模块生成水印信息特征图。
进一步的,步骤2具体为:编码器以载体图像和水印信息特征图作为输入,经过ConvBNReLU卷积块和混合频域通道注意力模块以及跳跃连接生成水印图像。
进一步的,编码器中的混合频域通道注意力模块,由两个分支组成,其中一个分支由多个SENet注意力模块组成,SENet注意力模块在通道压缩过程中使用全局平均池化层,即以二维离散余弦变换中的最低频分量作为对通道特征分配的权重;另一个分支由一个FCA注意力模块组成,FCA注意力模块根据JPEG压缩原理的8×8分块方式划分64个频域分量,并按照之字形的方式从最低频分量开始选择16个低频分量作为FCA注意力模块压缩后的权重;经过FCA注意力模块分支和SENet注意力模块分支生成的特征张量再在通道维度上进行拼接,并用一个ConvBNReLU卷积模块进行特征融合。
进一步的,步骤4具体为:解码器以噪声图像作为输入,经过ConvBNReLU卷积模块和SENet注意力模块进行降采样恢复水印信息。
进一步的,训练编码器的损失函数包括用以辅助编码器生成高质量水印图像,

其中,ICO为载体图像,IEN为水印图像,E表示编码器,θE为编码器E的参数,MEN为水印信息特征图;A表示对抗判别器,θA为对抗判别器A的参数。
进一步的,训练解码器的损失函数LD为:
LD=MSE(M,MD)=MSE(M,D(θD,INO))
其中,M为原水印信息,MD为解码恢复后的水印信息,D表示解码器,θD为解码器D的参数,INO为噪声图像。
进一步的,训练对抗判别器的损失函数LA为:
LA=log(1-A(θA,E(θE,ICO,MEN)))+log(A(θA,ICO))
其中,A表示对抗判别器,θA为对抗判别器A的参数,E表示编码器,θE为编码器E的参数,ICO为载体图像,MEN为水印信息特征图。
本发明采用的以上技术方案,与现有技术相比,具有的优点是:
引入了频道通道注意力对载体图像进行特征提取,利用通道上的多个频域分量使得编码过程减少了丢失的信息量,并且通过自主选择16个低频分量作为通道注意力的权重参数,相比于中频和高频的分量对于JPEG压缩具有更好的鲁棒性;
设计了双分支的结构,两个分支使用不同的注意力对于特征图进行特征学习,二者生成的特征图在通道维度上进行拼接再通过卷积层进行特征融合,使得生成的水印图像质量得到了较大提高。
说明书附图
图1为本发明的整体方法网络模型架构图;
图2为频域通道注意力的原理图;
图3为频域分量的选择示意图;
图4为对于噪声为JPEG压缩进行专门训练后的测试结果图;
图5为对于混合噪声训练后在多种不同噪声测试下的实验结果。
具体实施方式
下面将结合附图对本发明的技术方案进行清楚、完整地描述,显然, 所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
图1为本发明的整体方法网络模型架构。基于混合频域通道注意力的深度图像水印方法,包括:
S1:水印信息处理器把水印信息作为输入经过全连接层将水印信息扩散到每位信息上,再将扩散后的水印信息从一维变换至二维的特征图形式,然后通过扩散卷积层和注意力模块生成水印信息特征图。
S2:编码器把载体图像和水印信息特征图作为输入经过ConvBNReLU卷积块和混合频域通道注意力模块以及跳跃连接生成水印图像,其中混合频域通道注意力模块由两个分支组成。一个分支包含了多个SE注意力模块,SE注意力模块使用离散余弦变换域的最低频分量作为权重参数;另一个分支则根据8×8的分块按照之字形顺序选取了低频的16个分量作为权重参数。
关于频域通道注意力的原理见图2。具体而言,编码器中的混合频域通道注意力模块,其具体的结构设计为两个分支,一个分支是由多个SENet注意力模块组成,另一个分支是由一个FCA注意力模块组成,SENet在通道压缩过程中使用全局平均池化层,即以二维离散余弦变换中的最低频分量作为对通道特征分配的权重,而FCA注意力模块则根据上述的原理将全局平局池化层进行了修改,可以选择多个二维离散余弦变换的分量,本发明根据JPEG压缩原理的8×8分块方式也划分了64个频域分量,并按照之字形的方式从最低频分量开始选择16个低频分量作为FCA注意力模块压缩后的权重,经过FCA注意力模块分支和SENet注意力模块分支生成的特征张量再在通道维度上进行拼接,并用一个ConvBNReLU卷积模块进行特征融合。图2中的DCTN指代的是分块离散余弦变换;FreqN指代的是频率分量。关于频域分量的选择示意图见图3。
S3:噪声层把水印图像作为输入经过模拟的可微噪声生成带噪声的图像;在模型的训练过程中,对于每批输入的水印图像,噪声层从设置好的噪声中随机选择一个进行失真以模拟真实场景下的噪声环境。
S4:解码器把噪声图像作为输入经过ConvBNReLU卷积块和SENet注意力模块进行降采样恢复水印信息。
S5:对抗判别器对于载体图像和水印图像进行分类帮助编码器生成更高质量的水印图像,对抗判别器由多个卷积核大小为3×3的ConvBNReLU模块和一个全局平均池化层组成。
以下对上述的内容进行详细说明:
水印信息处理器主要负责处理水印信息并将处理后的特征图输入到编码器中。它接收由0和1组成的长度为L的二进制水印信息并输出大小为C'×H×W的水印信息特征图,其中C'是特征图的通道数。H是特征图的高,W是特征图的宽,具体来说,随机生成的长度为L的水印信息从一维变为二维的特征图其大小为{0,1}1×h×w,其中L=h×w。然后通过由卷积核大小为3×3的卷积层、批归一化层以及激活函数ReLU组成的卷积模块ConvBNReLU对其进行放大,并通过几个扩散卷积层将其大小扩展为C×H×W。最后,为了更适当地扩展信息,水印信息的特征图由几个SE注意力模块提取。
具有参数θE的编码器E把大小为3×H×W的RGB彩色图像,即载体图像ICO和水印信息特征图MEN作为输入,并输出大小为3×H×W的编码图像,即水印图像IEN。为了更好地选择通道特征,编码器使用了混合频率通道注意块,包括多个SE通道注意力模块和一个FCA频域通道注意力模块。FCA注意力模块选择多频分量的原理是:

其中,是离散余弦变换的基函数,其去除了一些常量系数,不影响结果,x2d作为离散余弦变换的输入,H是x2d的高,W是x2d的宽,并且u∈{0,1,...,H)1},v∈{0,1,...,W)1}。全局平均池化操作实际相当于当u=0和v=0时的离散余弦变换值,即最低频的分量:
整个编码器由多个卷积核大小为3×3的ConvBNReLU卷积块、一个混合频率通道注意模块以及一个卷积核大小为1×1的卷积层组成。第一步,它首先通过卷积核大小为3×3的ConvBNReLU卷积块放大载体图像,然后使用所提出的混合频率通道注意模块保证特征图大小不变,再利用卷积核大小为3×3的ConvBNReLU卷积块集中由注意力模块得到的特征图。第二步把从水印信息处理器获得的水印信息特征图和之前输出的由混合频率通道注意模块得到的载体图像、特征图输入到卷积核大小为3×3的ConvBNReLU卷积块进行特征融合。第三步将融合后的特征图和由跳跃连接输送来的载体图像拼接成一个新的特征图,并将其送入一个卷积核大小为1×1的卷积层获得编码图像IEN。训练编码器旨在通过更新参数θE来最小化ICO和IEN之间的L2距离:
整个模型的鲁棒性由噪声层提供。噪声层中的噪声从指定的噪声池中进行选择,它以编码图像IEN作为输入并输出大小相同的噪声图像INO。在模型的训练过程中,对于每批输入的编码图像噪声层从设置好的噪声中随机选择一个进行失真以模拟真实场景下的噪声环境。
具有参数θD的解码器D的任务是从噪声图像INO中恢复长度为L的水印信息MD,该部分决定了整个模型提取水印的能力。在解码阶段,将噪声图像INO输入到卷积核大小为3×3的ConvBNReLU层,并通过多个SE注意力模块对获得的特征图进行下采样。然后,通过卷积核大小为3×3的卷积层将多通道张量转换为单通道张量,并改变单通道张量的形状,得到解码后的水印信息MD。解码器训练的目标是通过更新参数θD使原水印信息M和MD之间的L2距离最小化:
LD=MSE(M,MD)=MSE(M,D(θD,NO))
由于在误码率指标中起着重要的作用,因此该损失函数LD在总损失 函数中所占的比例最大。
对抗判别器A由多个卷积核大小为3×3的ConvBNReLU模块和一个全局平均池化层组成。在对抗网络的影响下,编码器会尽可能地欺骗对手,使对手判别器无法对ICO和IEN做出正确的判断,并且更新参数θE去最小化损失函数LE2,以提升编码器的编码质量:
参数为θA的判别器需要区分ICO和IEN作为二值分类器。对手的目标是通过更新θA来最小化分类损失LA
LA=log(1-A(θA,E(θE,ICO,MEN)))+log(A(θA,ICO))
总的损失函数是并且LA是对于对抗判别器的损失函数。λE,λD和λA分别是各个损失函数的权重参数,在训练中设置为1,10和0.0001。
上述对于损失函数的设计,体现在损失函数具体为两部分,一部分是对于编码器和解码器的损失函数LD和其中用于辅助编码器生成高质量的水印图像,使用L2损失使载体图像和水印图像在视觉上尽可能的相似,由对抗判别器生成的损失辅助编码器;另一部分是用于训练判别器的损失函数LA
实施例1
本模型为了体现普适性,可以从ImageNet的图像数据集中随机选择10000张图像作为模型的训练集,然后从COCO的图像数据集中随机选择5000张作为验证集和5000张作为测试集。数据集在输入模型训练前先进行预处理裁剪为128×128的大小,设置批量为16,训练的轮次为150。对于训练过程中的优化算法选择动态的Adam,并且设置学习率为0.001。对于JPEG压缩噪声的测试,可以使用PIL中所带的库函数实现。在训练过程中,水印信息的嵌入强度设置为1。为了衡量水印算法的性能,使用PSNR和SSIM计算载体图像和水印图像之间的相似度来表示水印算法的不可感知性,使用水印信息和解码器恢复的水印信息之间的误码率来表现 水印算法的鲁棒性。对于噪声为JPEG压缩进行专门训练后的测试结果图见图4。
采用其他方法在JPEG压缩噪声训练下的测试实验,相关数据见表1。
表1其他方法在JPEG压缩噪声训练下的测试实验
训练单噪声模型和混合噪声模型的设置。单噪声模型指噪声层只包括了一种噪声,训练好的水印模型只对该噪声具有较强的鲁棒性,以JPEG压缩为例,噪声层的设置为无噪声、模拟的JPEG-Mask和真实的JPEG压缩。这样选择的原因是因为真实的JPEG压缩是不可微分的噪声,反馈的模型参数无法加入到模型的训练中,而模拟的JPEG-Mask只是人工设置的一个JPEG压缩模板无法达到真实JPEG压缩的效果,所以选择无噪声、JPEG-Mask和真实JPEG压缩三种进行混合训练最大程度上模拟出真实环境的JPEG压缩,JPEG压缩的强度因子设置为50。
对于混合噪声训练后在多种不同噪声测试下的实验结果图见图5。混合噪声模型是在噪声层中设置了多种噪声,使得训练好的模型能够对大多数的噪声都实现较好的鲁棒性,本实施例提供一种混合噪声模型训练的设置,噪声层设置为JPEG(Q=50)、JPEG-Mask(Q=50)、无噪声和Crop(p=0.0225),注意在噪声层中包括类似于剪切的几何噪声时,水印信息处理器要对水印信息先通过一个全连接层扩散水印信息同时在解码器末尾也要加入一个全连接层进行逆变换。表2为在混合噪声训练下对于多种噪声与其它方法的测试实验。
权重的选择。训练的轮次预设是150,训练完成后从记录的训练日志中根据验证集的总损失选择几个极小值对应的训练轮次作为测试要导入模型的权重。
表2在混合噪声训练下对于多种噪声与其它方法的测试实验
测试的方法。测试过程中要强调的是训练过程中的水印图像和测试过程中是有所不同的。训练过程中编码器生成的水印图像是直接输入到噪声层中参与整个训练,而在测试过程中水印信息处理器、编码器、解码器的权重参数是固定的,编码器生成的水印图像与载体图像的差值Idiff代表水印信息,Idiff与水印嵌入强度α相乘再与载体图像在像素维度上相加生成测试用的水印图像即IEN=ICO+α×Idiff=ICO+α×(IEN-ICO),在训练过程中因为强度因子α是1,而在测试过程中可以调整强度因子的大小来平衡鲁棒性和不可见性以适用于不同的应用环境。在设置好测试的参数后,将之前选择的训练权重导入进行测试,根据测试集图像的结果取平均值代表测试的总体性能。
表3为对于每种噪声进行单一训练后,在调整强度因子使误码率接近0%的情况下比较编码图像质量的结果。
表3编码图像质量的结果
表4为对于噪声为JPEG压缩专门训练后在不同的质量因子和不同强度因子下的测试结果。
表4在不同的质量因子和不同强度因子下的测试结果
显然,上述实施例仅仅是为清楚地说明所作的举例,而并非对实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引伸出的显而易见的变化或变动仍处于本发明创造的保护范围之中。

Claims (8)

  1. 一种基于混合频域通道注意力的深度图像水印方法,其特征在于,包括如下步骤:
    步骤1:水印信息处理器生成水印信息特征图;
    步骤2:编码器将载体图像和水印信息特征图生成水印图像;
    步骤3:噪声层把水印图像作为输入,经过模拟的可微噪声生成噪声图像;
    步骤4:解码器对上述噪声图像进行降采样恢复水印信息;
    步骤5:对抗判别器对载体图像和水印图像进行分类以使编码器生成高质量水印图像。
  2. 根据权利要求1所述的基于混合频域通道注意力的深度图像水印方法,其特征在于,所述步骤1具体为:水印信息处理器以水印信息作为输入,经过全连接层将水印信息扩散到每位信息上,再将扩散后的水印信息从一维变换为二维的特征图形式,然后通过扩散卷积层和注意力模块生成水印信息特征图。
  3. 根据权利要求2所述的基于混合频域通道注意力的深度图像水印方法,其特征在于,步骤2具体为:编码器以载体图像和水印信息特征图作为输入,经过ConvBNReLU卷积块和混合频域通道注意力模块以及跳跃连接生成水印图像。
  4. 根据权利要求3所述的基于混合频域通道注意力的深度图像水印方法,其特征在于,编码器中的混合频域通道注意力模块,由两个分支组成,其中一个分支由多个SENet注意力模块组成,SENet注意力模块在通道压缩过程中使用全局平均池化层,以二维离散余弦变换中的最低频分量作为对通道特征分配的权重;另一个分支由一个FCA注意力模块组成,FCA注意力模块根据JPEG压缩原理的8×8分块方式划分64个频域分量,并按照之字形的方式从最低频分量开始选择16个低频分量作为FCA注意力模块压缩后的权重;经过FCA注意力模块分支和SENet注意力模块分支生成的特征张量再在通道维度上进行跳跃连接,并用一个ConvBNReLU卷积模块进行特征融合。
  5. 根据权利要求4所述的基于混合频域通道注意力的深度图像水印方法,其特征在于,步骤4具体为:解码器以噪声图像作为输入,经过 ConvBNReLU卷积模块和SENet注意力模块进行降采样恢复水印信息。
  6. 根据权利要求4所述的基于混合频域通道注意力的深度图像水印方法,其特征在于,训练编码器的损失函数包括用以辅助编码器生成高质量水印图像,

    其中,ICO为载体图像,IEN为水印图像,E表示编码器,θE为编码器E的参数,MEN为水印信息特征图;A表示对抗判别器,θA为对抗判别器A的参数。
  7. 根据权利要求5所述的基于混合频域通道注意力的深度图像水印方法,其特征在于,训练解码器的损失函数LD为:
    LD=MSE(M,MD)=MSE(M,D(θD,INO))
    其中,M为原水印信息,MD为解码恢复后的水印信息,D表示解码器,θD为解码器D的参数,INO为噪声图像。
  8. 根据权利要求6所述的基于混合频域通道注意力的深度图像水印方法,其特征在于,训练对抗判别器的损失函数LA为:
    LA=log(1-A(θA,E(θE,ICO,MEN)))+log(A(θA,ICO))
    其中,A表示对抗判别器,θA为对抗判别器A的参数,E表示编码器,θE为编码器E的参数,ICO为载体图像,MEN为水印信息特征图。
PCT/CN2023/101599 2022-08-10 2023-06-21 基于混合频域通道注意力的深度图像水印方法 WO2024032158A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/453,846 US20240054594A1 (en) 2022-08-10 2023-08-22 Method for watermarking depth image based on mixed frequency-domain channel attention

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210955381.3A CN115272044A (zh) 2022-08-10 2022-08-10 基于混合频域通道注意力的深度图像水印方法
CN202210955381.3 2022-08-10

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/453,846 Continuation US20240054594A1 (en) 2022-08-10 2023-08-22 Method for watermarking depth image based on mixed frequency-domain channel attention

Publications (1)

Publication Number Publication Date
WO2024032158A1 true WO2024032158A1 (zh) 2024-02-15

Family

ID=83750279

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101599 WO2024032158A1 (zh) 2022-08-10 2023-06-21 基于混合频域通道注意力的深度图像水印方法

Country Status (2)

Country Link
CN (1) CN115272044A (zh)
WO (1) WO2024032158A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117743768A (zh) * 2024-02-21 2024-03-22 山东大学 基于去噪生成对抗网络和扩散模型的信号去噪方法及系统
CN117876273A (zh) * 2024-03-11 2024-04-12 南京信息工程大学 一种基于可逆生成对抗网络的鲁棒图像处理方法
CN117876273B (zh) * 2024-03-11 2024-06-07 南京信息工程大学 一种基于可逆生成对抗网络的鲁棒图像处理方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272044A (zh) * 2022-08-10 2022-11-01 大连大学 基于混合频域通道注意力的深度图像水印方法
CN115439702B (zh) * 2022-11-08 2023-03-24 武昌理工学院 一种基于频域处理的弱噪声图像分类方法
CN115496973B (zh) * 2022-11-17 2023-02-21 南京信息工程大学 一种基于分块域变换模拟技术的jpeg对抗样本生成方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111491170A (zh) * 2019-01-26 2020-08-04 华为技术有限公司 嵌入水印的方法及水印嵌入装置
KR102277099B1 (ko) * 2020-02-26 2021-07-15 광운대학교 산학협력단 딥 러닝을 이용한 워터마크 및 해상도 적응적 영상 워터마킹 시스템
CN113222800A (zh) * 2021-04-12 2021-08-06 国网江苏省电力有限公司营销服务中心 一种基于深度学习的鲁棒图像水印嵌入与提取方法及系统
CN114529441A (zh) * 2022-01-19 2022-05-24 华南理工大学 一种图像频域数字水印方法、系统、装置及介质
CN114549273A (zh) * 2022-02-28 2022-05-27 中山大学 基于深度神经网络的自适应鲁棒水印嵌入方法及系统
CN115272044A (zh) * 2022-08-10 2022-11-01 大连大学 基于混合频域通道注意力的深度图像水印方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111491170A (zh) * 2019-01-26 2020-08-04 华为技术有限公司 嵌入水印的方法及水印嵌入装置
KR102277099B1 (ko) * 2020-02-26 2021-07-15 광운대학교 산학협력단 딥 러닝을 이용한 워터마크 및 해상도 적응적 영상 워터마킹 시스템
CN113222800A (zh) * 2021-04-12 2021-08-06 国网江苏省电力有限公司营销服务中心 一种基于深度学习的鲁棒图像水印嵌入与提取方法及系统
CN114529441A (zh) * 2022-01-19 2022-05-24 华南理工大学 一种图像频域数字水印方法、系统、装置及介质
CN114549273A (zh) * 2022-02-28 2022-05-27 中山大学 基于深度神经网络的自适应鲁棒水印嵌入方法及系统
CN115272044A (zh) * 2022-08-10 2022-11-01 大连大学 基于混合频域通道注意力的深度图像水印方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117743768A (zh) * 2024-02-21 2024-03-22 山东大学 基于去噪生成对抗网络和扩散模型的信号去噪方法及系统
CN117743768B (zh) * 2024-02-21 2024-05-17 山东大学 基于去噪生成对抗网络和扩散模型的信号去噪方法及系统
CN117876273A (zh) * 2024-03-11 2024-04-12 南京信息工程大学 一种基于可逆生成对抗网络的鲁棒图像处理方法
CN117876273B (zh) * 2024-03-11 2024-06-07 南京信息工程大学 一种基于可逆生成对抗网络的鲁棒图像处理方法

Also Published As

Publication number Publication date
CN115272044A (zh) 2022-11-01

Similar Documents

Publication Publication Date Title
WO2024032158A1 (zh) 基于混合频域通道注意力的深度图像水印方法
Park et al. Double JPEG detection in mixed JPEG quality factors using deep convolutional neural network
CN110232650B (zh) 一种彩色图像水印嵌入方法、检测方法及系统
CN111105376B (zh) 基于双分支神经网络的单曝光高动态范围图像生成方法
WO2021103676A1 (zh) 一种基于整数小波变换的自适应可逆信息隐藏方法
Malonia et al. Digital image watermarking using discrete wavelet transform and arithmetic progression technique
CN111612708A (zh) 一种基于对抗生成网络的图像修复方法
CN114549273A (zh) 基于深度神经网络的自适应鲁棒水印嵌入方法及系统
CN115953321A (zh) 一种基于零次学习的低照度图像增强方法
Fang et al. Encoded feature enhancement in watermarking network for distortion in real scenes
US20220335560A1 (en) Watermark-Based Image Reconstruction
CN108616757B (zh) 一种翻拍后能提取水印的视频水印嵌入与提取方法
Ponomarenko et al. Sharpness metric for no-reference image visual quality assessment
CN116342362B (zh) 深度学习增强数字水印不可感知性方法
CN115880125B (zh) 基于Transformer的软融合鲁棒图像水印方法
CN117274023A (zh) 一种使用聚频dct变换引导的抗多噪音水印方法
CN116452401A (zh) 一种抗图像攻击的可逆鲁棒水印嵌入与提取模型构建方法
CN116503230A (zh) 一种基于双通道的鲁棒图像水印算法
US20240054594A1 (en) Method for watermarking depth image based on mixed frequency-domain channel attention
CN114529442A (zh) 一种采用两阶段预编码和小波网络的鲁棒图像水印方法
CN114119330A (zh) 一种基于神经网络的鲁棒数字水印嵌入、提取方法
Mirza et al. Digital video watermarking based on principal component analysis
CN117255232B (zh) 基于自注意力机制的dwt域鲁棒视频水印方法及系统
Zhong et al. Enhanced Attention Mechanism-Based Image Watermarking With Simulated JPEG Compression
Hua et al. Dual channel watermarking—A filter perspective

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23851385

Country of ref document: EP

Kind code of ref document: A1