CN114022506B

CN114022506B - Image restoration method for edge prior fusion multi-head attention mechanism

Info

Publication number: CN114022506B
Application number: CN202111356234.6A
Authority: CN
Inventors: 张加万; 赵晨曦; 李会彬
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2024-05-17
Anticipated expiration: 2041-11-16
Also published as: CN114022506A

Abstract

The invention relates to the technical field of image restoration, and discloses an image restoration method of an edge prior fusion multi-head attention mechanism, which comprises the following steps of S1: acquiring experimental data and preprocessing the data, wherein the experimental data comprises a training set and a testing set, and extracting an edge map of an image from the preprocessed image; step S2: the method comprises the steps of constructing an edge first-fusion multi-attention mechanism repair model, wherein the edge repair model comprises an edge repair model and an image repair model, the edge repair model takes an extracted edge image, an original image and a mask image as inputs, outputs the extracted edge image and the original image as repaired edge images, and the image repair model trains by taking the repaired edge images and the defect images as inputs; by means of fusing a multi-head attention mechanism, the image restoration effect is achieved by extracting more abundant images and depending on pixels for a long distance.

Description

Image restoration method for edge prior fusion multi-head attention mechanism

Technical Field

The invention relates to the technical field of image restoration, in particular to an image restoration method of an edge prior fusion multi-head attention mechanism.

Background

In information society, images are the most important sources of information. How to obtain more complete and clear images has also become a hotspot in the field of computer vision, and related fields of application include image restoration and super resolution. Image restoration refers to a technique of recovering a complete image from the rest of the image information in a corrupted image. For the human eye this is not a laborious task, but for computer vision it is a rather challenging task. There are many practical solutions to this technique, such as image restoration (for removing photo scratches and text occlusions), photo editing (removing unwanted objects), image encoding and transmission (network during image transmission) require the use of image block content loss caused by data packet loss. Therefore, the image restoration technique is a very popular research field in recent years.

At present, an image is repaired based on a generated countermeasure network to become a mainstream, the generated countermeasure network enables an image which is similar to training data but does not exist to be generated through a network model so as to achieve the effect of spurious, in recent years, the effect of image repair is continuously proposed by improving the generated countermeasure network through utilizing the generation characteristics of the generated countermeasure network, in the prior art, the convolutional neural network only pays attention to pixel values of local areas when learning characteristics, and the influence of relevance of pixels of a remote area on image generation and repair is ignored.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention aims to provide an image restoration method with an edge priori fusion multi-head attention mechanism.

In order to achieve the above object, the present invention provides the following technical solutions:

An image restoration method for fusing multiple head attention mechanisms by edge prior comprises the following steps of

Step S1: acquiring experimental data and preprocessing the data, wherein the experimental data comprises a training set and a testing set, and extracting an edge map of an image from the preprocessed image;

Step S2: the method comprises the steps of constructing an edge first-fusion multi-attention mechanism repair model, wherein the edge repair model comprises an edge repair model and an image repair model, the edge repair model takes an extracted edge image, an original image and a mask image as inputs, outputs the extracted edge image and the original image as repaired edge images, and the image repair model trains by taking the repaired edge images and the defect images as inputs;

The image restoration model comprises an image restoration device, wherein the image restoration device generates restoration pictures after repeated sampling of restored edge images, repeated residual convolution based on expansion convolution, one multi-head attention network and two deconvolutions;

step S3: and evaluating the result of fusing the multi-attention mechanism repair model on the edge through the test set.

In the invention, further, the edge restoration model comprises an edge restoration device, wherein the edge restoration device samples the extracted edge map, the original image and the mask image, and converts the feature map into a single-channel edge map after multiple convolution residual errors based on the expansion volume and two deconvolutions.

In the invention, further, the edge repair model repair method comprises the following steps:

Step S20: obtaining a predicted edge restoration result of the edge restoration device, obtaining a generation result of an edge restoration model according to the predicted edge restoration result, reserving an image edge of an already-regional area, and filling an edge part needing restoration in the missing region, wherein the generation result comprises the following steps:

C_p＝G_e(M,C,I_gray)

C₊＝C·(1-M)+C_p·M

Where C _p represents the predicted edge restoration image, G _e represents the edge restoration device, M represents the mask image, C represents the edge map of the image to be restored, I _gray represents the gray map of the image to be restored, and C ₊ represents the generated restoration edge image of the table edge restoration model.

In the invention, further, the repairing method of the edge repairing model further comprises the following steps of

Step S21, calculating a loss function of the edge restorer, wherein the loss function is a weighted summation of the generated edge countermeasure loss and the edge characteristic loss;

step S22: and optimizing the generation result of the edge restoration model to obtain a restored edge image.

In the present invention, further, the method for repairing the image repairing model includes:

Step S23: obtaining a predicted repair image by using tensors spliced by the repaired edge image and the damaged image as input, and obtaining the repair image according to the predicted repair image:

I_p＝G_i(M,C+I^M)

I₊＝I·(1-M)+I_p·M

Wherein, I _p is a predicted repair image, I is a real image, G _i is an image healer C ₊ is a repair edge image, I ^M;

step S24: calculating an image restoration loss function, and optimizing a restoration result of an image restoration model, wherein the image restoration loss function comprises image contrast loss, style loss and perception loss, and the calculation method comprises the following steps:

wherein, lambda 3, lambda 4, lambda 5 and lambda 6 are custom super-parameters, The contrast loss generated for the image restoration model,For style loss,/>Is a perceived loss.

In the present invention, further, the generating the repair picture by the image repair device after sampling the repaired edge image for a plurality of times, performing residual convolution based on the dilation convolution for a plurality of times, performing multi-head attention network once and performing deconvolution twice includes: step S2-1: the obtained feature images passing through the convolution layer and the residual error network are subjected to different convolution changes to obtain a plurality of groups query, key, value of feature images;

step S2-2: acquiring a reconstructed feature map;

Step S2-3: splicing the reconstructed feature images according to the channel dimension to obtain a plurality of attention combination results;

Step S2-4: after the original input feature size is converted by the convolution network conversion, the restored reconstructed feature map and the original feature map are added to be used as a final output restoration picture result.

In the present invention, further, the step S2-2 of obtaining the reconstructed feature map includes:

Step S2-2-1: converting the key feature map into a rank, and performing dot product operation on groups between the query feature map and the key feature map after converting the rank to obtain a plurality of groups of correlation attention matrixes;

Step S2-2-2: normalizing the correlation attention moment array;

step S2-2-3: and carrying out matrix multiplication operation on the normalized self-attention matrix of each group of correlations and the value feature map of the group to obtain a reconstructed feature map of the group.

Step S2-3 is to splice the reconstructed feature images according to the channel dimension, and the obtaining of a plurality of attention combination results comprises the following steps:

step S2-3-1: obtain the attention result of the i-th head:

wherein, the Qi, ki and Vi are used for representing the feature map query key value matrix of the ith head;

step S2-3-2: the self-attention results of all heads are spliced, a W matrix is used for carrying out fusion projection on a plurality of feature spaces to the size of an original matrix, and a plurality of self-attention combined results are finally obtained:

MultiHead＝Concat(heead₁,dead₂,...,head_h)W^o

wherein, A gram matrix representing the predictive image vector inner product, gr _i(I^M) represents the gram matrix of the true image vector inner product, c _ih_iw_i represents the dimension of the activation feature.

The style loss calculation method comprises the following steps:

In the present invention, further, the value feature map element in the value feature map in step S2-2-3 uses the correlation attention matrix to reconstruct the pixel by weighting, and the weight of other elements in the weighted reconstruction process is the pixel value corresponding to the correlation attention moment matrix.

Compared with the prior art, the invention has the beneficial effects that:

According to the invention, the multi-head attention network capable of capturing the long-distance relativity between the richer pixel areas is added after the last residual layer of the image restoration model, in order to enable the model to learn information in different subspaces, a plurality of repeated parallel attention calculations are used, each head is used for processing different information, so that the characteristics of different parts can be processed, the richer long-distance relativity is extracted, the multi-head self-attention network can learn the relativity matrixes of different modes, the multi-head attention network has very important effect on improving restoration results, and the restoration effect of the image is improved.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a general flow chart of an image restoration method of the edge prior fusion multi-head attention mechanism of the present invention;

FIG. 2 is a flowchart of step S2 in an image restoration method of an edge prior fusion multi-head attention mechanism of the present invention;

FIG. 3 is a workflow diagram of step S2-2 and step S2-3 in an image restoration method of an edge prior fusion multi-head attention mechanism of the present invention;

FIG. 4 is a flowchart of an implementation of a method for repairing an edge repair model in an image repair method of an edge prior fusion multi-head attention mechanism of the present invention;

FIG. 5 is a schematic diagram of acquiring a query, key, value feature map in an image restoration method of an edge prior fusion multi-head attention mechanism of the present invention;

FIG. 6 is a schematic diagram of a correlation attention matrix acquisition flow in an image restoration method of an edge prior fusion multi-head attention mechanism according to the present invention

FIG. 7 is a schematic flow chart of a reconstructed feature map in an image restoration method of an edge prior fusion multi-head attention mechanism of the invention;

FIG. 8 is a diagram of a multi-head self-attention layer network architecture in an image restoration method of the edge prior fusion multi-head attention mechanism of the present invention;

FIG. 9 is a schematic diagram of an edge restoration model construction framework in an image restoration method of an edge prior fusion multi-head attention mechanism of the invention;

FIG. 10 is a schematic diagram of an edge image restoration model construction framework in an image restoration method of an edge prior fusion multi-head attention mechanism of the invention;

Fig. 11 is a schematic diagram of experimental results of an image restoration method of the edge prior fusion multi-head attention mechanism of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It will be understood that when an element is referred to as being "fixed to" another element, it can be directly on the other element or intervening elements may also be present. When a component is considered to be "connected" to another component, it can be directly connected to the other component or intervening components may also be present. When an element is referred to as being "disposed on" another element, it can be directly on the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

Referring to fig. 1, a preferred embodiment of the present invention provides an image restoration method for edge prior fusion multi-head attention mechanism, comprising

Step S2: the method comprises the steps of constructing an edge first-fusion multi-attention mechanism repair model, wherein the edge repair model comprises an edge repair model and an image repair model, the edge repair model takes an extracted edge image, an original image and a mask image as inputs and outputs the extracted edge image and the original image and the mask image as repaired edge images, and the image repair model trains by taking the repaired edge images and defect images as inputs;

Specifically, the method collects a certain amount of images with good related images according to experimental requirements, completes data acquisition, then uses a preprocessing technology to perform preliminary processing on the data to obtain data meeting standards, divides the data set into a training set and a testing set, gradually builds an image restoration model according to algorithm design, trains the model by using the training set after the model is built, and performs test evaluation on the model effect by using the testing set. In the scheme, the multi-head attention mechanism is fused, so that the edge is fused with the multi-attention mechanism repair model first, the richer images are extracted, and the pixels are relied on for a long distance to achieve the effect of improving the image repair.

In the invention, celeba is adopted to disclose a dataset image, and the picture size is adjusted to 256×256 and then the dataset image is applied to experiments. However, since the data set does not divide the training set, the verification set and the test set, the first 18 ten thousand pictures are selected from the data set for training the model, and 4000 pictures are selected as test set analysis and comparison experiment results. In addition, mask images used by the user during model training are taken from an irregular mask data set, the irregular mask images which are arranged by the data set are divided into six groups according to the proportion of the area of a missing area to the whole image, namely, 0-10%, 10-20%, 20-30%, 30-40%, 40-50% and 50-60% of code images, each group comprises 2000 images, 1000 mask images represent the situation that the image boundary is missing, and the other 1000 mask images represent the situation that the image boundary is intact.

Extracting an edge map of an image by Canny edge detection aiming at an image originally input by a training set, wherein the Canny edge detection comprises four steps: gaussian filtering, calculating a gradient value and a gradient direction, filtering a non-maximum value, detecting edges by using upper and lower threshold values, obtaining an edge map of an original image, carrying out the edge mapping through a binary mask, and generating a repaired edge image through an antagonistic network. In the test task, 4000 pictures are taken out from Celeba data sets and used as test sets, a missing area is simulated by using mask patterns with random missing, and hand-drawn mask patterns used for testing are divided into 1 to 4 groups according to the ratio of the missing area from small to large, wherein each group contains 1000 4000 pictures in total.

In the present invention, further, as shown in fig. 9, the edge restoration model includes an edge restoration device, and the edge restoration device samples the extracted edge map, the original image and the mask image, and converts the feature map into a single-channel edge map after multiple convolution residuals based on the dilated volume and two deconvolutions. Specifically, the scheme converts the feature map into a single-channel edge map after 3 downsampling, 8 residual convolutions based on expansion convolution and 2 deconvolution.

Specifically, in the present invention, as shown in fig. 4, the repairing method of the edge repairing model is as follows:

step S20: the edge repairing device splices an edge image of an image to be repaired, a mask image and a gray level image of the repaired image into tensors to be used as input, so that a predicted edge repairing result of the edge repairing device is obtained, a generating result of an edge repairing model is obtained according to the predicted edge repairing result, the image edge of an already-regional image is reserved, and the missing region is filled with an edge part needing repairing, and the method comprises the following steps:

C_p＝G_e(M,C,I_gray)

C₊＝C·(1-M)+C_p·M

Where C _p represents the predicted edge restoration image, G _e represents the edge restoration device, M represents the mask image, C represents the edge map of the image to be restored, I _gray′ represents the gray map of the image to be restored, and C ₊ represents the generated restored edge image of the edge restoration model.

In the present invention, further, the repairing method of the edge repairing model further includes:

Step S22: and optimizing the generation result of the edge restoration model according to the loss function to obtain a restored edge image finally output by the edge restoration model.

In particular, the loss function of the edge restoration model belongs to a mixed loss function, the purpose of which is to constrain the results of the edge discriminator processing, and the mixed function is to generate a weighted sum of edge countermeasure loss and edge feature loss. Wherein, generating edge contrast loss is a type of cross entropy of two classes, which can be recorded as:

wherein, Representing edge contrast loss,/>Representing real pictures and gray-scale map expectations,/>Representing the grayscale and original image expectations, D _e represents the edge discriminator.

Secondly, the edge feature loss is a distance function defined on the feature layer of the edge healer, the main function is to calculate the sum of the distances between the generated edge and the features of different layers extracted by the edge healer, which are actually detected by using canny, and then the feature loss formula expression is as follows:

wherein, Representing edge feature loss, n represents the number of active layers of the edge discriminator

Finally, the optimization objective of the edge model can be written as:

wherein, Representing minimized edge healer,/>Representing a maximized edge discriminator.

In the invention, as shown in fig. 10, the image restoration model comprises an image restoration device, wherein the image restoration device generates restoration pictures after sampling restored edge images for a plurality of times, residual convolution based on expansion convolution for a plurality of times, one multi-head attention network and two deconvolutions.

The convolutional neural network only focuses on the pixel values of the local area when learning the features, but ignores the influence of the relevance of the pixels of the remote area on the image generation and restoration, so that the design of a plurality of attention mechanism models is also designed for capturing the remote dependency relationship better, wherein one of the multi-head self-attention network is based on the expansion structure of the self-attention network, and the self-attention network can capture the remote relationship among the pixels in imagination effectively. But there is not only one set of pixel long-distance relationships for each region, but the self-attention network is insufficient to learn multiple long-distance relationships, so we employ a multi-head attention network that can capture the long-distance relationships between more abundant pixel regions. The multi-headed self-care network may learn correlation matrices of different patterns, which has a very important role in improving repair results.

Specifically, as shown in fig. 2, the scheme adds a multi-head self-attention layer network after the last residual layer, and the specific scheme is as follows:

Step S2-1: the obtained feature images passing through the convolution layer and the residual error network are subjected to different convolution changes to obtain a plurality of groups query, key, value of feature images;

Specifically, as shown in fig. 5, the size of the query feature map is B _g×W_f×H_f×C_q, where B _g is a hidden variable batch input by the generator, W _f is the width of the query feature map, H _f is the height of the query feature map, C _q is the channel dimension of the query feature map, key is the size of the feature map is B _g×W_f×H_f×C_k, and several other parameters are the same as the query feature map, and C _k is the channel dimension of the key feature map. The Value feature map has a size of B _g×W_f×H_f×C_v, other parameters are the same as keys and query, and C _v is the channel dimension of the feature map.

Step S2-2: the reconstructed feature map is obtained, as shown in fig. 3, and the specific method comprises the following steps:

Step S2-2-1: converting the key feature map into a rank, and performing dot product operation on groups between the query feature map and the key feature map after converting the rank, as shown in fig. 6, to obtain a plurality of groups of correlation attention matrixes;

Step S2-2-2: the correlation attention moment array is normalized, and in the step, the dot-integration matrix is normalized by a method such as Softmax.

Step S2-2-3: the normalized self-attention matrix of each group of correlations is matrix multiplied with the value feature map of the group, as shown in fig. 7, to obtain a reconstructed feature map of the group. And the value feature map elements in the value feature map are used for carrying out weighted reconstruction on the pixels by using the group of correlation attention matrixes, and the weight values of other elements in the weighted reconstruction process are pixel values corresponding to the correlation attention moment matrixes.

Further, after the reconstructed feature map is obtained, step S2-3 is performed, as shown in fig. 8,

Step S2-3: and splicing the reconstructed feature images according to the channel dimension to obtain a plurality of attention combination results.

In one embodiment provided by the invention, a specific method for obtaining multiple attention combination results is as follows:

step S2-3-1: obtain the attention result of the i-th head:

MultiHead＝Concat(head₁,head₂,...,hean_h)W^o

In summary, the scheme adds the multi-head self-attention layer network, but the output characteristic size is not changed, so that the remote information processed by a plurality of heads is more participated, and the image restoration effect is further improved.

In the invention, further, the image of the image restoration model after being processed by the edge restoration model is utilized for restoration, and the specific restoration method comprises the following steps:

I_p＝G_i(M,C₊,I^M)

I ₊＝I·(1-M)+I_p·MI_p is a predicted repair image, I is a real image, G _i is an image healer, C ₊ is a repair edge map, and I ^M is a defect map.

Step S24: and calculating an image restoration loss function, and optimizing the restoration result of the image restoration model, wherein the image restoration loss function comprises image contrast loss, style loss and perception loss.

For example, the image contrast loss is similar to the generated edge contrast loss of the edge restoration model, the image contrast loss of the image restoration modelThe method comprises the following steps:

furthermore, the first occurrence of style loss was proposed in the task of image style migration, and in a new improvement, by introducing a gram matrix (GramMatrix), the artifact problem that exists in deconvolution is alleviated. The model herein employs a loss function based on the style loss of the gram matrix. Its loss function The expression is as follows:

wherein, A gram matrix representing the predictive image vector inner product, gr _i(I^M) represents the gram matrix of the true image vector inner product, c _ih_iw_i represents the dimension of the activation feature. Four active layers relu-2, relu-4, relu-4, and relu-2 in the VGG19 network are selected for use herein.

In addition, the perception loss is penalized by defining a distance measure between pre-training activated layers to penalize the generated image which does not coincide with the perception result of the real image phenomenonCan be defined as:

wherein, in the formula Corresponding to the 5 active layers of the pre-trained VGG19 network are relu-1, relu-2-1, relu-1, relu4-1 and relu-1, respectively. Where w _i represents the weight parameter (the values of this scheme w _i are all 1).

In summary, the loss function of the image restoration model contains multiple loss functions, and can be jointly calculated as:

In the invention, further, after the edge fusion multi-attention mechanism repair model training is completed, the result of model repair is tested and evaluated through a test set, and the part is mainly completed by using PyTorch learning frames on two 1080 TIANGPU. The quality and the repairing effect of the chapter model are evaluated through four evaluation indexes of peak signal-to-noise ratio (PSNR), similarity (SSIM), l _i error and distance score (FID).

In addition, as shown in fig. 11, the repair result of the multi-attention mechanism repair model is fused at the edge of the scheme, from left to right, the first image is an original image, the second image is an image to be repaired covered by a binary mask, the third image is an image repaired by the edge repair model, and the fourth image and the fifth image are result pictures repaired by the image repair model. Therefore, the pictures repaired by the prior fusion multi-attention mechanism repair model can be intuitively observed to be very similar to the original pictures, but the repair of certain completely missing parts remembers the differences from the original pictures, but the pictures are not different according to the human sensory observation. The scheme has good repairing effect and can reasonably repair the missing part. The result shows that the network is superior to expectations in the aspect of image restoration by fusing a multi-head attention mechanism.

The foregoing description is directed to the preferred embodiments of the present invention, but the embodiments are not intended to limit the scope of the invention, and all equivalent changes or modifications made under the technical spirit of the present invention should be construed to fall within the scope of the present invention.

Claims

1. An image restoration method of an edge prior fusion multi-head attention mechanism is characterized by comprising the following steps of

step S3: evaluating the result of fusing the multi-attention mechanism repair model on the edge through the test set;

The image healer samples the healed edge image for a plurality of times, residual convolution based on expansion convolution for a plurality of times, and generating a healed picture after one-time multi-head attention network and two deconvolutions comprises the following steps:

Step S2-1: the characteristic diagrams obtained through the convolution layer and the residual error network are subjected to different convolution changes to obtain a plurality of groups query, key, value of characteristic diagrams;

step S2-2: acquiring a reconstructed feature map;

Step S2-4: transforming the restored reconstructed feature map and the original feature map into the original input feature size through convolution network transformation, and adding the restored reconstructed feature map and the original feature map to obtain a final output restoration picture result;

The step S2-2 of obtaining the reconstructed feature map comprises the following steps:

Step S2-2-2: normalizing the correlation attention moment array;

step S2-2-3: performing matrix multiplication operation on the normalized self-attention matrix of each group of correlations and the value feature map of the group to obtain a reconstructed feature map of the group;

step S2-3-1: obtain the attention result of the i-th head:

Step S2-3-2: splice the self-attention results of the individual heads, use The matrix performs fusion projection on the multiple feature spaces to the original matrix size, and finally a plurality of self-attention combined results are obtained:

MultiHead＝Concat(head₁,head₂,...,head_h)Wo。

2. The image restoration method for an edge prior fusion multi-head attention mechanism according to claim 1, wherein the edge restoration model comprises an edge restoration device, the edge restoration device samples the extracted edge map, the original image and the mask image, and converts the feature map into a single-channel edge map after a plurality of times of convolution based on expansion convolution residual convolution and two deconvolutions.

3. The image restoration method of an edge prior fusion multi-head attention mechanism according to claim 2, wherein the restoration method of the edge restoration model is as follows:

C_p＝G_e(M,C,I_gray)

C₊＝C·(1-M)+C_p·M

where C _p represents the predicted edge restoration image, G _e represents the edge restoration device, M represents the mask image, C represents the edge map of the image to be restored, I _gray represents the gray map of the image to be restored, and C ₊ represents the generated restored edge image of the edge restoration model.

4. An image restoration method according to claim 3, wherein said restoration method of said edge restoration model further comprises

5. The image restoration method of an edge prior fusion multi-head attention mechanism according to claim 1, wherein the restoration method of the image restoration model comprises the following steps:

I_p＝G_i(M,C₊,I^M)

I₊＝I·(1-M)+I_p·M

Wherein I _p is a predicted repair image, I is a real image, G _i is an image healer,

G ₊ is a repair edge image, I ^M is a defect image, and I ₊ is a repair image;

wherein, lambda 3, lambda 4, lambda 5 and lambda 6 are custom super-parameters, Loss of antagonism generated for image repair models,/>For style loss,/>Is a perceived loss.

6. The image restoration method of an edge prior fusion multi-head attention mechanism according to claim 5, wherein the style loss calculation method is as follows:

7. The method for repairing an image by using an edge prior fusion multi-head attention mechanism according to claim 1, wherein the value feature map elements in the value feature map in the step S2-2-3 are used for carrying out weighted reconstruction on the pixels by using the group of correlation attention matrixes, and the weights of other elements in the weighted reconstruction process are the pixel values corresponding to the correlation attention moment matrixes.