CN115115500A

CN115115500A - Watermark embedding method combined with underwater image enhancement

Info

Publication number: CN115115500A
Application number: CN202210852829.9A
Authority: CN
Inventors: 骆挺; 吴俊�; 何周燕; 徐海勇; 宋洋
Original assignee: College of Science and Technology of Ningbo University
Current assignee: College of Science and Technology of Ningbo University
Priority date: 2022-07-19
Filing date: 2022-07-19
Publication date: 2022-09-27

Abstract

The invention discloses a watermark embedding method combined with underwater image enhancement, which comprises the following steps: a watermark encoder combining image enhancement is provided, watermark embedding and image enhancement are integrated into a unified structure, an image enhancement process is considered during watermark embedding, a residual error attention module is added into the watermark encoder to strengthen the attention to a quality degradation area, watermark information is redundantly embedded five times, a multi-scale downsampling fusion discriminator is constructed, and combined training is carried out on the discriminator, the watermark encoder and a watermark extractor until a loss function is converged. The method comprehensively considers the embedding of the watermark information and the enhancement of the original underwater image through the watermark encoder, adjusts the depth characteristic representation of the original underwater image through the residual error attention module, improves the visual quality of the watermark image and the robustness of the watermark embedding, and further improves the visual quality and the robustness of the watermark information through the combined training of the watermark encoder, the watermark extractor and the discriminator.

Description

Watermark embedding method combined with underwater image enhancement

Technical Field

The invention belongs to the field of image watermarking, and particularly relates to a watermark embedding method combined with underwater image enhancement.

Background

Due to the selective absorption of the water medium to the light waves and the scattering effect of particles in water, the underwater images obtained by shooting usually have blurred details, low contrast, distorted colors and the like. To solve this problem, underwater image enhancement adjusts contrast, improves sharpness and color correction, etc. by a method based on a physical model and a non-physical model to improve the visual quality of an image. However, with the development of multimedia and network technologies, the enhanced images may be copied and tampered by illegal users during transmission or sharing, and thus, the corresponding copyrights need to be protected. The digital image watermarking technology can embed extra information into an image, so that the problem of copyright dispute is effectively solved.

Watermarking technologies can be generally divided into fragile watermarks and robust watermarks, wherein the fragile watermarks are sensitive to image modification and are mainly used for image tampering detection. The robust watermark is that the copyright identification of the image is embedded into the original underwater image, and the image containing the watermark can still extract relatively complete watermark information after general image processing or malicious attack. Conventional robust watermarks are generally based on a transform domain, and the watermark is embedded in the image by modifying corresponding transform coefficients. The method has certain robustness to common image attacks, however, the generalization capability to different attacks is weak.

Deep learning exhibits excellent performance in different fields of computer vision and natural language processing, which benefits from its powerful feature extraction capability. With this in mind, researchers began exploring the design of a deep learning-based watermarking framework. Before the advent of deep learning techniques, most watermarking methods typically used machine learning tools to improve performance. The invisibility of the watermarking method and the robustness to geometric attacks can be improved by using a machine learning tool, but the features for training need to be extracted manually, so that the performance of the watermarking method is limited to a great extent. Later, with the development of deep learning technology, some watermark technologies based on deep networks are provided, and the generalization capability of the watermark to resist different attacks can be better solved. The network model aims to design an end-to-end trainable deep network watermark model, and invisibility and robustness of the watermark model are improved by designing a reasonable loss function and adding a noise layer. However, embedding of the watermark will destroy the enhanced image, and therefore, this process is in conflict with image enhancement. How to consider image enhancement in the watermark embedding process is not discussed in depth in the existing watermark method.

Disclosure of Invention

The present invention aims to provide a watermark embedding method combined with underwater image enhancement to solve the problems of the prior art.

In order to achieve the above object, the present invention provides a watermark embedding method combined with underwater image enhancement, comprising:

acquiring an original underwater image, constructing a watermark encoder combined with image enhancement, and carrying out image enhancement and embedding initial binary watermark information on the original underwater image based on the watermark encoder to acquire a watermark image with an enhancement effect;

constructing a noise layer, inputting the watermark image and the original underwater image into the noise layer, and obtaining a noise image;

constructing a watermark extractor, extracting and decoding the watermark information of the noise image based on the watermark extractor, and obtaining target binary watermark information;

constructing a discriminator, and grading the watermark image and the label image;

constructing a multi-modal loss function, and evaluating the global content, color and texture information of the watermark image and the loss aiming at the watermark robustness;

and combining the enhanced watermark encoder and the watermark extractor for combined training, alternately training with the discriminator, and performing iterative updating in the training process until the multi-modal loss function is converged.

Optionally, the watermark encoder includes: 5 downsampling convolution blocks, 5 upsampling convolution blocks, two normal convolution blocks, and a residual attention module.

Optionally, in the process of acquiring the noise image: the image is attacked by any one of Crop (p%), Dropout (p%), resize (scale), or JPEG (Q), wherein Crop (p%) represents randomly cropping the image as a percentage of p%; cropout (p%) means that a p% pixel region is selected from the watermark image, then a (1-p%) pixel region is selected from the original underwater image, and then the two are spliced into a new image; dropout (P%) indicates that the watermark image pixels are retained by a percentage of P%, the remaining pixels being filled by the original underwater image; resize (scale) means to enlarge or reduce the image in scale; JPEG (Q) denotes JPEG compression of an image by a quality factor Q.

Optionally, the watermark extractor includes: several volume blocks, GAP layer, and full connection layer.

Optionally, the discriminator includes 4 convolution blocks and 1 convolution layer, and adopts a multi-scale downsampling fusion strategy and a markov Patch-GAN architecture, wherein the input of the first convolution block is the watermark image and the label image, the input of the second to fourth convolution blocks is the output of the previous convolution block and a feature map of downsampling the label image to a corresponding size, and the convolution layer outputs a score.

Optionally, the acquiring process of the watermark image includes: the original underwater image obtains a first feature map through a first rolling block, the watermark information is copied and expanded to the same size of the first feature map, channel splicing is carried out on the watermark information and the feature map, and a second rolling block is input after splicing is completed to be processed to obtain a second feature map; the watermark information is copied and expanded to the size same as that of the second feature map, channel splicing is carried out on the watermark information and the second feature map, a first downsampling volume block is input after splicing is finished, a first circulation process is carried out, a third feature map, a fourth feature map, a fifth feature map and a sixth feature map are sequentially obtained, the sixth feature map is input into a fifth downsampling volume block to obtain a seventh feature map, the seventh feature map is input into the residual attention module to adjust feature representation, and an eighth feature map is obtained; and entering a second circulation process, and sequentially obtaining a ninth characteristic diagram, a tenth characteristic diagram, an eleventh characteristic diagram, a twelfth characteristic diagram and a watermark image, wherein jump connection is formed between mirror images, and connected characteristics are firstly subjected to channel splicing and then input into an upsampling volume block.

Optionally, the process of adjusting the feature representation by the residual attention module includes: calculating one-dimensional channel attention by utilizing the interdependency among the channels, and multiplying the channel attention by the input element to obtain a feature map after the channel attention is adjusted; and finally, performing residual connection on the input feature map and the output feature map to obtain a residual attention feature map.

Optionally, the multi-modal loss function includes: representing image global similarity loss by mean square error

Inputting the watermark image and the label image into a pre-trained VGG-19 network, respectively extracting high-level features output by a block 5-conv 2 layer, wherein the difference is perception loss, and the calculation method is that

The difference between the initial binary watermark information and the target binary watermark information is minimized mean square error, and the expression is

The output result of the discriminator on the watermark image is taken as the adversarial loss of the watermark encoder, and is expressed as:

wherein GT index labels the image, I _en As watermark image, C is the image's channelThe number of tracks, H being the height of the image, W being the width of the image, M _in For said initial binary watermark information, M _out For said target binary watermark information, θ _D Is a discriminator parameter; the goal of the co-training is to minimize losses, expressed as:

wherein λ is _i ，λ _p ，λ _m ，λ _adv Relative weights for each term are set to 1.0, 0.5, 0.2, 0.3, respectively; during the discriminator training, the following losses are minimized:

optionally, the first cyclic process is as follows: and copying and expanding the watermark information to the size which is the same as that of the Nth feature map, and performing channel splicing with the Nth feature map to be used as the input of an (N-1) th downsampling convolution block to obtain an (N + 1) th feature map, wherein N is from 2 to 5.

Optionally, the second cyclic process is as follows: and inputting the M-th feature map into an M-7 th upsampling volume block to obtain an M + 1-th feature map, wherein M is from 8 to 11, and inputting the twelfth feature map into a fifth upsampling module to obtain a watermark image.

The invention has the technical effects that:

the method comprehensively considers the embedding of the watermark information and the enhancement of the original underwater image through the watermark encoder, adjusts the depth characteristic representation of the original underwater image through the residual error attention module, improves the visual quality of the watermark image and the robustness of the watermark embedding, and further improves the visual quality and the robustness of the watermark information through the combined training of the watermark encoder, the watermark extractor and the discriminator.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:

FIG. 1 is a network structure of a discriminator in an embodiment of the present invention;

FIG. 2 is a block diagram of a residual attention module according to an embodiment of the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Example one

As shown in fig. 1-2, the present embodiment provides a watermark embedding method combined with underwater image enhancement, including:

the model consists of four parts: (1) parameter is theta _WWE In combination with the image enhancement watermark encoder WWE, the original underwater image I _co And watermark information M _in As input, and generates a watermark image I _en (ii) a (2) A noise layer, the original underwater image and the watermark image as input, and generating a noise image I _no (3) Parameter is theta _E The watermark extractor E of (1), extracting the noise image I _no As input, and extracts watermark information M _out (4) Parameter is theta _D The discriminator D of (1) for watermarking the image I _en And label image GT as input, and outputs different scores S to discriminate I _en The image quality of (a).

Watermark encoder WWE in conjunction with image enhancement: WWE is based on U-Net, is a U-type network and has a hopping connection between the mirror layers. This idea of jump connection has proven to be very effective for image-to-image conversion and image quality enhancement problems. Thus, in WWE the network, this structure is used to accomplish watermark embedding and image enhancement.

Set original underwater image

Has a size of CxHxW, wherein

Represents the range of image pixel values, i.e., { 0.,. 255 }. For embedded binary watermark information asOne-dimensional vector, denoted M _in E {0,1} L, where L represents the length of the binary watermark information, used to control the watermark capacity. The watermark encoder WWE in combination with image enhancement performs enhancement processing on the original underwater image and embeds watermark information into the enhanced image to generate a watermark image I _en This process can be expressed as:

I _en ＝f _WWE (I _co ，M _in ；θ _WWE )

convolution blocks form the basic components of the network, each convolution block comprising a convolution layer (conv), an activation function and a batch normalization layer (BN). A leaky modified linear unit (leakyreu) with a negative slope of 0.2 is used as an activation function for the downsampling process at WWE, and the upsampling process uses a modified linear unit (ReLU) as an activation function. ConvBlocks in the model has a convolution kernel size of 3 × 3, step size of 2 and padding of 1 during downsampling, and a convolution kernel size of 3 × 3, step size of 1 and padding of 1 during upsampling.

In the down-sampling process, the plurality of convolution kernels in the convolution block enlarges the channel number of the image, so that richer features are known for image enhancement, and more potential embedding positions are provided for watermark embedding. However, the features captured by the convolution kernel have only weak spatial and channel correlations. Certain channel or spatial locations may exhibit poor invisibility to the embedded watermark and there are also different channel and spatial region inconsistency attenuation problems during image enhancement. To solve these problems, a Residual Attention Module (RAM) is designed which enables the network to focus on image quality degradation areas and watermark invisible areas and to give greater weight to these areas. It is placed after the last volume block of the downsampling to enhance WWE network performance.

ConvBlock in a watermark encoder combined with image enhancement represents a common convolutional block, ConvBlock-do and ConvBlock-up represent a downsampled convolutional block and an upsampled convolutional block, respectively, and RAM represents a residual attention module. The Kernel Information is in the form of "number of output channels × (convolution Kernel height × convolution Kernel width × input channel)Number) ". The watermark encoder WWE incorporating image enhancement consists of 5 downsampled convolution blocks, 5 upsampled convolution blocks, 2 normal convolution blocks and one residual attention block. During the down-sampling, the number of channels of the convolutional layer gradually increases, and conversely, the number of channels of the convolutional layer gradually decreases during the up-sampling. Original underwater image I _co Obtaining a signature representation I by means of the first ConvBlock _co1 . This step initially extracts texture features in the image and provides an embedding location for the watermark. Then, the binary watermark information M _in Extension to sum I by replication _co1 Same size as I _co1 Channel splicing is carried out, and I is obtained by the second ConvBlock processing _co2 . Following this, the binary watermark information M _in Extension to sum I by replication _co2 Same size as I _co2 Performing channel splicing, and processing by a first ConvBlock-do to obtain I _co3 . Similarly, the second to fourth ConvBlock-do perform channel splicing on the copied and expanded binary watermark information and the output characteristic diagram of the previous ConvBlock-do to serve as input, and respectively output I _co4 ，I _co5 And I _co6 . The last ConvBlock-do only inputs the output characteristic diagram of the last ConvBlock-do and generates I _co7 Then, mixing I _co7 Inputting into RAM to adjust the feature representation to obtain I _co8 . Finally, mixing I _co8 Inputting into ConvBlock-up, the first to fifth ConvBlock-up respectively outputting I _co9 ，I _co10 ，I _co11 ，I _co12 And I _en . In addition, this step has a jump connection between the mirrors, i.e. (I) _co9 ,I _co6 )、(I _co10 ,I _co5 )、(I _co11 ,I _co4 )、(I _co12 ,I _co3 ) In the meantime. Specifically, channel splicing is performed on the connected features, and then the connected features are input into ConvBlock-up. In order not to influence the image enhancement effect, watermark information is selected to be embedded in the down sampling process. The watermark information is embedded five times in total, so that redundancy can be increased to enhance the robustness of watermark extraction. The whole process is essentially to jointly encode the watermark information and the image enhancement features by using convolution operation,to obtain a watermark image with enhanced effect.

Noise layer: the noise layer is a key part for improving the robustness of the watermark model. During the transmission process of the watermark image in a communication channel, various label image processing attacks are inevitably generated. In order to extract complete watermark information, different attacks are added during training, and robustness of a watermark model to specific real attacks can be effectively improved. Therefore, a noise subnetwork is designed, various attacks are simulated as a micro-network layer in iterative training of the network, and only one attack is randomly selected in each training loop.

The noise layer includes five attacks, Crop (p%), Dropout (p%), resize (scale), and jpeg (q), respectively. Crop (p%) means randomly cropping the image as a percentage of p%. Cropout (p%) denotes the watermark image I _en In the image, a p% pixel area is selected and then the original underwater image I is selected _co Select 1-p% pixel area, and then combine the two into a new image. Dropout (P%) indicates that watermark image I is retained by a percentage of P% _en Pixels, the rest pixels are composed of original underwater image I _co And (6) filling. Resize (scale) means to enlarge or reduce an image in scale. JPEG (Q) denotes JPEG compression of an image by a quality factor Q.

The watermark extractor E: the extractor E learns to decode the watermark information to derive from the received noisy image I _no Extract watermark information M _out This process can be expressed as:

M _out ＝f _E (l _no ；θ _E )

where ConvBlock is the same as in WWE networks, ConvBlock-K denotes that there are K identical ConvBlock volume blocks, GAP denotes global averaging pooling, and FC denotes a fully connected layer. Noisy image I _no Characterization of a 64-channel by the first ConvBlock _no1 . Then, the first _no1 Inputting the K identical ConvBlock to be processed to obtain 64-channel characteristic diagram representation I _no2 . The purpose of this step is to extract the rich deep features of the image and thus extract the embedded watermark information. Next, I is added _no2 Inputting GAP layer to perform global average pooling to obtain a 1 × 1 × 64 tensor I _no3 . Finally, mixing I _no3 Transformed into a 1 x 64 one-dimensional vector and processed by the FC layer to generate the final length L of binary watermark information. The essence of watermark reconstruction is to extract watermark information from different levels of image features.

A discriminator D: discriminator D pair watermark image I _en And the label image GT, and outputting a score S, which can be expressed as: s ═ f _D (I _en ，GT；θ _D )

The purpose of the discriminator is to discriminate the watermark image I _en A higher score is assigned and a lower score is assigned to the tag image GT to enhance the similarity between the two. The discriminative power of the discriminators will undoubtedly affect the performance of WWE because they are updated in a competitive relationship. Therefore, in order to improve WWE performance, a multi-scale downsampling fusion strategy is proposed in the model, and a Markov Patch-GAN architecture is adopted. The architecture assumes that image pixels are independent of image patch size, i.e., discrimination is based only on patch level information. This assumption is important for capturing high frequency characteristics such as local texture and style. As shown in fig. 1, the first ConvBlock input is a watermark image and a label image, the second to fourth ConvBlock inputs are the output of the previous ConvBlock and the downsampling of the label image to a feature map of the corresponding size, the convolution layer output scores, and the downsampling of the label image GT is done using the same ConvBlock. Finally, a convolution operation is used to output a score of size 16 × 16 × 1. The convolution kernel size in each convolution block is 3 x 3, step size 2 and padding 1, and is normalized using the ReLU activation function and batch processing (BN).

Residual attention module

An attention mechanism in deep learning may enable web learning to focus on important features and ignore irrelevant features. It is applied in the present model and the weight size is adjusted according to the importance of space and channel using the attention module (RAM) with residual concatenation. Computing input feature maps with convolution blocks

For example, the module first calculates the attention of one-dimensional channel by using the interdependence relationship between channels, and then multiplies the attention of channel by the input element to obtain the feature map after the attention of channel is adjusted

Secondly, calculating two-dimensional space attention of Q by utilizing the interdependence relation of the space areas, multiplying the space attention and Q by elements, and outputting a characteristic diagram finally containing channel attention and space attention

Finally, residual error connection is carried out on the input characteristic diagram F and the output characteristic diagram U to obtain a residual error attention characteristic diagram

The structure of the residual attention module is shown in fig. 2.

For channel attention, spatial information is first aggregated using average pooling and maximum pooling in the spatial direction for the input feature map F to obtain

And

the following formula can be used for calculation:

the two pooling operations being per channel

Global information is compressed into two scalars as a representation of spatial features. To model the correlation between each channel, propagation is performed using a shared network consisting of two fully-connected channels

And

and then, fusing the two feature vectors through element addition, and converting the fused feature vectors into channel attention through a sigmoid function. The channel attention CA is obtained by:

wherein σ (-) denotes a sigmoid function, δ (-) denotes a ReLU function,

and

the weights of the two fully connected layers are represented separately, and r is set to 16 in order to reduce the model computation cost. Finally, each element of CA is multiplied by each pass of F to compute Q, which can be expressed as:

wherein the content of the first and second substances,

representing pixel multiplication. Thus, a feature map after the attention of the channel is adjusted is obtained.

Similar to channel attention, channel information is first aggregated using average pooling and maximum pooling in the channel direction on the input profile Q to obtain

And

the following formula can be used for calculation:

these two pooling operations compress all channel information into one channel as a representation of the channel characteristics. Next, the process of the present invention is described,

and

and (4) carrying out channel splicing, and obtaining the space attention through the convolution layer and the sigmoid function. The calculation of spatial attention SA may be expressed as:

wherein σ (·) represents a sigmoid function, Conv (·) represents convolution operation, and-represents channel splicing. In the convolution operation, the size of the convolution kernel is set to 7 × 7. Finally, each element of SA is multiplied by the element at the corresponding position of Q to calculate U, which can be expressed as U-SA × Q

Furthermore, to avoid the gradient vanishing problem and to maintain good characteristics of the original features, a residual join is added, adding F to U yields F'. The overall process of the residual attention module can be expressed as a function: f' ═ RAM (F)

Thus, when a feature F passes through the RAM, important spatial regions and channels are given greater weight, and vice versa. F' has a stronger representation capability for image enhancement and watermark embedding.

Loss function

In went, the watermark encoder WWE combined with image enhancement and the extractor E adopt a joint end-to-end working mode, and perform synchronous updating during training. The discriminator is alternately optimized based on the idea of mutual confrontation. Digital image watermarkingThe basic requirement of the technology is to ensure the original underwater image I _co And watermark image I _en Visually indistinguishable there between. However, unlike general digital image watermarking techniques, since the image enhancement process is taken into account when embedding the watermark, the requirement of this model is to enhance the watermark image I _en Similarity with the label image GT. To achieve this goal, the mean square error is used to represent the image global similarity loss:

further, in order to encourage the generation of a watermark image whose content (i.e., feature representation) is similar to the tag image GT image, the watermark image I _en And inputting the label image GT into a pre-trained VGG-19 network, and then respectively extracting the high-level features of the block5_ conv2 layer output and minimizing the difference between the high-level features and the high-level features. This difference is called the perceptual loss and is calculated as follows:

watermarking techniques require accurate extraction of watermark information from the watermark image. Embedded watermark information M _in Each value of which is 0 or 1, and extracted watermark information M _out Is a floating point number between 0 and 1. Minimizing M during training using mean square error loss _in And M _out The difference between them:

when the model training is completed and the model is actually applied, M is required to be added _out Rounded to 0 or 1 to construct the true binary sequence. As described above, discriminator D employs a Markov Patch-GAN architecture that is effective in capturing high frequency information about texture and style. Thus, a discriminator is used on a watermark imageThe output of (c) is used as an adversarial loss of WWE encoder to enhance local texture and style consistency. It can be expressed as:

in summary, the training goal of the combined image enhanced watermark encoder WWE and extractor E is to minimize:

wherein λ is _i ，λ _p ，λ _m ，λ _adv The relative weight of each item is expressed and set to 1.0, 0.5, 0.2 and 0.3, respectively, according to the experimental results.

The discriminator strives to reduce the prediction score of the label image GT and to enlarge the watermark image I _en The prediction score of (1). To train discriminator D, the following penalty is minimized:

the enhanced watermark encoder WWE and extractor E are jointly trained and alternately trained with the discriminator D until the loss function converges. Wherein WWE and E together minimize a loss function

And D is responsible for minimizing

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A watermark embedding method in combination with underwater image enhancement, comprising the steps of:

2. The watermark embedding method in combination with underwater image enhancement as claimed in claim 1, wherein the watermark encoder comprises: 5 downsampling convolution blocks, 5 upsampling convolution blocks, two normal convolution blocks, and a residual attention module.

3. The watermark embedding method combined with underwater image enhancement as recited in claim 1, wherein the acquisition process of the noise image comprises: the image is attacked by any one of Crop (p%), Dropout (p%), resize (scale), or JPEG (Q), wherein Crop (p%) represents randomly cropping the image as a percentage of p%; cropout (p%) means that a p% pixel region is selected from the watermark image, then a (1-p%) pixel region is selected from the original underwater image, and then the two are spliced into a new image; dropout (P%) indicates that the watermark image pixels are retained by a percentage of P%, the remaining pixels being filled by the original underwater image; resize (scale) means to enlarge or reduce the image in scale; JPEG (Q) denotes JPEG compression of an image by a quality factor Q.

4. The watermark embedding method in combination with underwater image enhancement as claimed in claim 1, wherein said watermark extractor comprises: several volume blocks, GAP layer, and full connection layer.

5. The watermark embedding method in combination with underwater image enhancement according to claim 1, characterized in that the discriminator comprises 4 convolution blocks and 1 convolution layer, and adopts a multi-scale downsampling fusion strategy and a markov Patch-GAN architecture, wherein the input of the first convolution block is the watermark image and the label image, the input of the second to fourth convolution blocks is the output of the previous convolution block and the feature map of the label image downsampled to the corresponding size, and the convolution layer outputs scores.

6. The watermark embedding method combined with underwater image enhancement as claimed in claim 2, wherein the watermark image obtaining process comprises: the original underwater image obtains a first feature map through a first rolling block, the watermark information is copied and expanded to the same size of the first feature map, channel splicing is carried out on the watermark information and the feature map, and a second rolling block is input after splicing is completed to be processed to obtain a second feature map; the watermark information is copied and expanded to the size same as that of the second feature map, channel splicing is carried out on the watermark information and the second feature map, a first downsampling volume block is input after splicing is finished, a first circulation process is carried out, a third feature map, a fourth feature map, a fifth feature map and a sixth feature map are sequentially obtained, the sixth feature map is input into a fifth downsampling volume block to obtain a seventh feature map, the seventh feature map is input into the residual attention module to adjust feature representation, and an eighth feature map is obtained; and entering a second circulation process, and sequentially obtaining a ninth characteristic diagram, a tenth characteristic diagram, an eleventh characteristic diagram, a twelfth characteristic diagram and a watermark image, wherein jump connection is formed between mirror images, and connected characteristics are firstly subjected to channel splicing and then input into an upsampling volume block.

7. The watermark embedding method in combination with underwater image enhancement as claimed in claim 6, wherein said process of residual attention module adjusting feature representation comprises: calculating one-dimensional channel attention by utilizing the interdependency among the channels, and multiplying the channel attention by the input element to obtain a feature map after the channel attention is adjusted; and finally, performing residual connection on the input feature map and the output feature map to obtain a residual attention feature map.

8. The method of claim 1, wherein the multi-modal loss function comprises: representing image global similarity loss by mean square error

wherein GT index labels the image, I _en Is a watermark image, C is the number of channels of the image, H is the height of the image, W is the width of the image, M _in For the initial binary watermark information, M _out For said target binary watermark information, θ _D Is a discriminator parameter; the goal of the co-training is to minimize losses, expressed as:

9. the watermark embedding method combined with underwater image enhancement as recited in claim 6, wherein the first loop process is: and copying and expanding the watermark information to the size which is the same as that of the Nth feature map, and performing channel splicing with the Nth feature map to be used as the input of an (N-1) th downsampling convolution block to obtain an (N + 1) th feature map, wherein N is from 2 to 5.

10. The watermark embedding method combined with underwater image enhancement as claimed in claim 6, wherein said second loop process is: and inputting the M-th feature map into an M-7 th upsampling volume block to obtain an M + 1-th feature map, wherein M is from 8 to 11, and inputting the twelfth feature map into a fifth upsampling module to obtain a watermark image.