CN115829880A - Image restoration method based on context structure attention pyramid network - Google Patents
Image restoration method based on context structure attention pyramid network Download PDFInfo
- Publication number
- CN115829880A CN115829880A CN202211664365.5A CN202211664365A CN115829880A CN 115829880 A CN115829880 A CN 115829880A CN 202211664365 A CN202211664365 A CN 202211664365A CN 115829880 A CN115829880 A CN 115829880A
- Authority
- CN
- China
- Prior art keywords
- image
- loss function
- scale
- attention
- context
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Processing (AREA)
Abstract
The invention discloses an image restoration method based on a context structure attention pyramid network, which comprises the following steps: using CelebA-HQ and Places2 data sets, sorting and dividing the data sets into a training set and a data set, and preprocessing the data sets; constructing a pyramid network based on a structural attention mechanism, and performing training modeling by using a training set to obtain an initial image restoration model; repairing the test set by using the model; and evaluating the repair capability of the model through indexes. The invention uses the U-Net structure as a backbone, encodes the context from low-level pixels into high-level semantic features, and decodes the context back into an image. By shifting the attention of the structure from deep to shallow layer by layer to fill the missing area, the consistency of the synthesized texture and the generated structure can be improved, and an image with fine-grained details can be restored. Compared with the existing algorithm, the algorithm has strong robustness and universality and better repairing effect.
Description
Technical Field
The invention belongs to the field of computer image restoration, and particularly relates to an attention pyramid network image restoration method based on a context structure.
Background
Image restoration is a technical means for estimating and restoring the content of a damaged or missing area according to the known content of an image so that the restored image can meet the human visual perception requirement as much as possible. As an important research content of computer vision and computer graphics, the method is widely applied to the fields of culture, life and the like, such as digital cultural heritage protection, target removal, old photo restoration, movie and television special effect production and the like.
Image inpainting is divided into a conventional method and a deep learning-based method. The traditional image restoration algorithm can be mainly divided into two types, firstly, the method is based on diffusion, and mainly aims to calculate the pixels to be filled by utilizing a differential equation based on the pixels of the edge, but the method is usually suitable for narrow missing areas such as cracks, scratches and the like, the image texture details cannot be presumed, and the defects exist in detail restoration. The other method is based on texture synthesis, and the core idea of the method is to construct a pixel block at the edge of the missing area, search a sample block which is most similar to the pixel block in the intact image area, fill the missing area in the image by using the found sample block, and repeatedly iterate the process until the whole missing area is filled.
The traditional algorithms have certain limitations, and when a missing area is large, or a scene with strong semantics is repaired, such as the five sense organs of a human face, the obtained result is general and does not have good robustness and universality.
Disclosure of Invention
The invention aims to provide an image restoration method based on a context structure attention pyramid network aiming at the defects in the prior art.
In order to achieve the purpose, the invention provides the following technical scheme: the image restoration method based on the context structure attention pyramid network comprises the following steps:
s1, constructing an original real image data set and an image data set to be restored based on an original real image and a mask corresponding to the original real image, and dividing the original real image data set and the image data set to be restored into a training set and a testing set;
s2, constructing a convolution module by taking the images to be restored in the training set as input and the scale characteristic diagrams respectively corresponding to the convolution layers as output and combining a style loss function and a perception loss function;
s3, constructing a structure attention module by taking each scale characteristic diagram as input and the initial characteristic restoration diagram as output;
s4, constructing a layered decoder by taking the image to be repaired in the training set as input and the initial characteristic repair image as output;
s5, taking the scale feature map and the initial feature restoration map as input, taking the restoration images of all scales as output, and combining a scale reconstruction loss function to construct a multi-scale decoder;
s6, based on a layered decoder and a multi-scale decoder, combining an edge preserving loss function with the training set to be repaired as input and the repaired image as output to construct a generator of a countermeasure network;
and S7, according to a generator of the countermeasure network, combining the discriminator and the countermeasure training loss function, constructing and training an image restoration model by taking the images to be restored in the training set as input and the corresponding original real images as output, and using the global loss function in the training to obtain the image restoration model.
Further, in the step S2, the convolution module includes 7 layers of convolution, a convolution kernel of each convolution layer is 3*3, a step size is 2, and a padding number is 1, wherein a deep-to-shallow feature map is
Further, the aforementioned step S3 includes the following sub-steps:
s3.1, respectively selecting n-n feature blocks in the missing areas of the two adjacent scale feature maps, and calculating the structural similarity between the two feature blocks, wherein the structural similarity is as follows:
where d () is the euclidean distance and m and σ are the mean and standard deviation, respectively.
S3.2, applying a softmax function to the similarity to obtain the attention score of each feature block, wherein the attention score is as follows:
and S3.3, carrying out attention transfer after obtaining the attention score from the high-level feature map, and filling the missing area of the adjacent bottom-level feature map by weighting according to the attention score, wherein the formula is as follows:
wherein l is the number of layers,is p l The fill-in area of (a) is,is the padded area of the initial feature repair map.
Further, the step S5 is specifically: the multi-scale decoder takes the initial feature repair map output by the structural attention module and the scale feature map output by the convolution module as input at the same time, and then decodes layer by layer as follows:
wherein psi L To characterize the L-th layer reconstruction of the attention transfer network,for the feature mapping of the L-th layer of the encoder,for the characteristics of the L-th layer of the multi-scale decoder, h is the transposed volumeThe volume of the mixture is accumulated,denotes a characteristic connection, λ 1 、λ 2 Are corresponding parameters.
Further, in the foregoing step S6, the repaired image is represented by the following formula:
Further, the penalty function of the aforementioned discriminator in S7 is as follows:
further, in step S2, when constructing the convolution module, the perceptual loss function is as follows:
wherein the real image is compared to the restored image by the corresponding activation feature map of ReLU _ i _1 (i =1,2,3,4,5) trained in VGG-19 network trained on ImageNet, N j Indicating the number of elements in the jth active layer,representing the corresponding activation map, x being the real image and z being the restored image;
when constructing the convolution module, the style loss function is used as follows:
wherein the content of the first and second substances,is a composed ofC of (a) j ×C j The size Gram matrix, x is the real image and z is the restored image.
Further, in step S6, when constructing the generator of the countermeasure network, the edge preserving loss function is used as follows:
L edge =‖E(z)-E(x)‖ 1 ,
wherein E (-) is a sobel filter, the image edge is extracted, x is a real image, and z is a model generation image.
Further, in step S5, when constructing the multi-scale decoder, the scale reconstruction loss function is as follows:
wherein x is l Is scaled toThe real picture of the same size is taken,is to beDecoding into RGB images of the same size, lambda l Is the weight for each scale.
Further, in the aforementioned step S7, an image inpainting model is constructed and trained, and a global loss function is used in the training as follows:
L=α 1 L m +α 2 L adv +α 3 L edge +α 4 L perc +α 5 L style ,
wherein L is m Is a multi-scale reconstruction loss function, L adv Is a antagonistic training loss function, L edge Is an edge-preserving loss function, L perc Is a perceptual loss function, L style Is a style loss function, alpha 1 、α 2 、α 3 、α 4 、α 5 Are corresponding parameters.
Compared with the prior art, the invention has the following beneficial effects: by adopting the method, the repairing quality can be further improved on the premise of ensuring the repairing effect. Richer texture details can be obtained through a layered encoder; the designed structure attention transfer module improves the consistency of the synthesized texture and the generated structure; the multi-scale decoder inputs two parts of characteristics and decodes the characteristics layer by layer, so that the consistency of vision and semantics can be realized; in addition, the method ensures that the repairing result is more real and natural through a plurality of loss functions.
Drawings
FIG. 1 is a schematic general flow diagram of the present invention.
FIG. 2 is a schematic diagram of the model structure of the present invention.
FIG. 3 is a schematic diagram of the structural attention transfer mechanism of the present invention.
FIG. 4 is a schematic diagram of the repair results of the present invention.
Detailed Description
In order to better understand the technical content of the present invention, specific embodiments are described below with reference to the accompanying drawings.
Aspects of the invention are described herein with reference to the accompanying drawings, in which a number of illustrative embodiments are shown. Embodiments of the invention are not limited to those illustrated in the drawings. It is to be understood that the invention is capable of implementation in any of the numerous concepts and embodiments described hereinabove or described in the following detailed description, since the disclosed concepts and embodiments are not limited to any embodiment. In addition, some aspects of the present disclosure may be used alone, or in any suitable combination with other aspects of the present disclosure.
As shown in fig. 1, the attention pyramid network image restoration method based on the context structure includes the following steps:
s1, constructing an original real image data set and an image data set to be restored based on an original real image and a mask corresponding to the original real image, and dividing the original real image data set and the image data set to be restored into a training set and a testing set;
s2, constructing a convolution module by taking the image to be repaired in the training set as input and the scale characteristic graph corresponding to each convolution layer as output and combining a style loss function and a perception loss function;
s3, constructing a structure attention module by taking each scale characteristic diagram as input and the initial characteristic restoration diagram as output;
s4, constructing a layered decoder by taking the image to be repaired in the training set as input and the initial characteristic repair image as output;
s5, constructing a multi-scale decoder by taking the scale characteristic diagram and the initial characteristic restoration diagram as input and the restoration images of all scales as output and combining a scale reconstruction loss function;
s6, based on a layered decoder and a multi-scale decoder, constructing a generator of an anti-network by combining an edge-preserving loss function with the training set to be repaired as input and the repaired image as output;
and S7, according to a generator of the countermeasure network, combining the discriminator and the countermeasure training loss function, constructing and training an image restoration model by taking the images to be restored in the training set as input and the corresponding original real images as output, and using the global loss function in the training to obtain the image restoration model.
The real image data set used in the invention adopts CelebA-HQ face data set and Places2 scene data set; an irregular mask dataset from an Nvidia team was used simultaneously, containing six different mask areas of the mask; the regular mask is formed by a rectangular mask, i.e. a white rectangular area is taken at the center of the image. And multiplying the real image and the mask element by element to obtain the image to be restored.
As shown in fig. 2, the model structure of the present invention is a coding-decoding structure, the coder part inputs an image to be repaired, the image is subjected to feature extraction by a convolution layer with a seven-layer convolution kernel of 3*3, a step length of 2 and a filling number of 1, and is activated by a nonlinear activation function leak ReLu, and as the number of layers of convolution layers continuously increases, the extracted features are gradually converted from low-level features such as texture, color and the like into high-level features such as semantic information and the like; and obtaining feature maps of different scales through the convolution operation.
The perception loss is that the characteristics obtained by convolution of the real image are compared with the characteristics obtained by convolution of the generated image, so that the image content is closer to the global structure. WhereinRepresenting the characteristics of the i-th layer in the pre-trained VGG-19 network. In this experiment, relu1_1, relu2_1, relu3_1, relu4_1, relu5 _1were used as the number of layers for extracting features, respectively.
In the step S2, a convolution module is constructed by taking the images to be restored in the training set as input and the scale characteristic maps respectively corresponding to the convolution layers as output and combining the style loss function and the perception loss function;
when constructing the convolution module, the perceptual loss function is used as follows:
wherein the real image is compared to the restored image by the corresponding activation feature map of ReLU _ i _1 (i =1,2,3,4,5) trained in VGG-19 network trained on ImageNet, N j Indicating the number of elements in the jth active layer,representing the corresponding activation map, x being the real image and z being the restored image;
the Gram matrix is first solved at each layer,extraction from VGG-19 networks in representing perceived lossAnd taking a Gram matrix of the feature vector, calculating Euclidean distances among corresponding layers, and finally adding the Euclidean distances of different layers to obtain the final style loss.
When constructing the convolution module, the style loss function is used as follows:
wherein the content of the first and second substances,is a composed ofC of (a) j ×C j The size Gram matrix, x is the real image and z is the restored image.
And S3, constructing a structural attention module by taking each scale characteristic diagram as input and the initial characteristic restoration diagram as output.
Under the assumption that pixels with similar semantics should have similar details, the encoder uses a structural attention transfer network (SAT) layer by layer in a pyramidal fashion to fill in missing regions from the high-level feature map to the low-level feature map; as shown in fig. 3, the network consists of two parts, namely attention calculation and attention transfer, and firstly, the feature blocks 3*3 are respectively selected from the inside and the outside of the missing area, and then the structural similarity between the two feature blocks is calculated:
where d () is the euclidean distance and m and σ are the mean and standard deviation. Then applying the softmax function to the similarity yields the attention score of each feature block:
and acquiring an attention score from the high-level feature map, performing attention transfer, and filling the missing area of the adjacent bottom-level feature map by using the attention score in a weighting manner.
Wherein l is the number of layers,is p l The fill-in area of (a) is,are the padded areas of the reconstructed feature map. The final reconstructed feature map is input to the decoder via a skip connection.
The decoder receives the reconstructed features from the SAT module and the potential features from the encoder as input at the same time, and then decodes layer by layer, and the specific formula is as follows:
…
wherein psi L To characterize the L-th layer reconstruction of the attention transfer network,for the feature mapping of the encoder L-th layer,for the features of the L-th layer of the multi-scale decoder, h is the transposed convolution,denotes a characteristic connection, λ 1 、λ 2 Are corresponding parameters.
The loss function of the invention consists of five parts: (1) Multi-scale reconstruction loss function L m The method is used for perfecting the generation of the image of each scale missing region; (2) Antagonism training loss function L adv Generating a more real picture through the confrontation training; (3) Edge-preserving loss function L edge Controlling the edge structure of the generated image; (4) Perceptual loss function L perc The pre-trained VGG model promotes better repairing effect; (5) Style loss function L style And through the pre-trained covariance matrix of the VGG model characteristics, overall common semantic information is kept and image restoration is promoted.
Multi-scale reconstruction loss function L m Calculating the L1 distance between a predicted image and a real image in each scale, gradually perfecting the prediction of a missing region in each scale by controlling the distance, and constructing a multi-scale decoder by using a scale reconstruction loss function as follows:
wherein x is l Is scaled toThe real picture of the same size is taken,is to beDecoding into RGB images of the same size, lambda l Is the weight of each scale;
in step S6, based on a layered decoder and a multi-scale decoder, a generator for constructing a countermeasure network by combining an edge-preserving loss function with an image to be repaired as input and a repaired image as output in a training set. In the invention, the SN-PatchGAN is used for constructing a discriminator, and in the process of resistance training, x is a real picture,is a symbol of multiplication element by element, M is a mask, z represents the whole of the generated image and the area not missing in the original image, that is:
the penalty function of the final discriminator is expressed as:
when a generator of the countermeasure network is constructed, the sobel filter is used as the edge preserving loss function, and the edge preserving loss function is expressed as follows:
L edge =‖E(z)-E(x)‖ 1 ,
in step S7, an image inpainting model is constructed and trained, and a global loss function is used in the training as the following total loss function:
L=α 1 L m +α 2 L adv +α 3 L edge +α 4 L perc +α 5 L style ,
wherein alpha is 1 、α 2 、α 3 、α 4 、α 5 To correspond to the parameters, L m Is a multi-scale reconstruction loss function, L adv Is a antagonistic training loss function, L edge Is an edge-preserving loss function, L perc Is a perceptual loss function, L style Is a style loss function.
And finally, testing the trained repairing model, and inputting the pre-trained test image into the model. And then evaluating the repairing effect of the model through L1Loss, peak signal to noise ratio (PSNR) and Structural Similarity (SSIM) indexes.
Fig. 4 shows the repair result of the damaged image by using the method of the present invention, and it can be seen that a good repair effect is achieved.
Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention should be determined by the appended claims.
Claims (10)
1. The image restoration method based on the context structure attention pyramid network is characterized by comprising the following steps of:
s1, constructing an original real image data set and an image data set to be restored based on an original real image and a mask corresponding to the original real image, and dividing the original real image data set and the image data set to be restored into a training set and a testing set;
s2, constructing a convolution module by taking the image to be repaired in the training set as input and the scale characteristic graph corresponding to each convolution layer as output and combining a style loss function and a perception loss function;
s3, constructing a structure attention module by taking each scale characteristic diagram as input and the initial characteristic restoration diagram as output;
s4, constructing a layered decoder by taking the image to be repaired in the training set as input and the initial characteristic repair image as output;
s5, taking the scale feature map and the initial feature restoration map as input, taking the restoration images of all scales as output, and combining a scale reconstruction loss function to construct a multi-scale decoder;
s6, based on a layered decoder and a multi-scale decoder, combining an edge preserving loss function with the training set to be repaired as input and the repaired image as output to construct a generator of a countermeasure network;
and S7, according to a generator of the countermeasure network, combining the discriminator and the countermeasure training loss function, constructing and training an image restoration model by taking the images to be restored in the training set as input and the corresponding original real images as output, and using the global loss function in the training to obtain the image restoration model.
2. The method for repairing an image based on a context and structure attention pyramid network as claimed in claim 1, wherein in step S2, the convolution module comprises 7 layers of convolution, the convolution kernel of each convolution layer is 3*3, the step size is 2, the padding number is 1, and the features from deep to light are mapped into
3. The method for image inpainting based on the context structure attention pyramid network as claimed in claim 1, wherein the step S3 comprises the following sub-steps:
s3.1, respectively selecting n-n feature blocks in the missing areas of the two adjacent scale feature maps, and calculating the structural similarity between the two feature blocks, wherein the structural similarity is as follows:
wherein d () is the euclidean distance, and m and σ are the mean and standard deviation, respectively;
s3.2, applying the softmax function to the similarity to obtain the attention score of each feature block, wherein the attention score is as follows:
s3.3, acquiring an attention score from the high-level feature map, then carrying out attention transfer, and filling the missing region of the adjacent low-level feature map in a weighting mode by using the attention score, wherein the formula is as follows:
4. The method for image restoration based on the context-based structure attention pyramid network as claimed in claim 3, wherein the step S5 specifically comprises: the multi-scale decoder takes the initial feature repair map output by the structural attention module and the scale feature map output by the convolution module as input at the same time, and then decodes layer by layer as follows:
wherein psi L To characterize the L-th layer reconstruction of the attention transfer network,for the feature mapping of the L-th layer of the encoder,for the features of the L-th layer of the multi-scale decoder, h is the transposed convolution,denotes a characteristic connection, λ 1 、λ 2 Are corresponding parameters.
7. the method for restoring an image based on a context structure attention pyramid network according to claim 1, wherein in step S2, when constructing the convolution module, the perceptual loss function is defined as follows:
wherein the real image is compared to the restored image by the corresponding activation feature map of ReLU _ i _1 (i =1,2,3,4,5) trained in VGG-19 network trained on ImageNet, N j Indicating the number of elements in the jth active layer,representing the corresponding activation map, x being the real image, z being the repair image;
when constructing the convolution module, the style loss function is used as follows:
8. The method for image inpainting based on context and structure attention pyramid network as claimed in claim 1, wherein in step S6, when constructing the generator of the countermeasure network, the edge preserving loss function is as follows:
L edge =‖E(z)-E(x)‖ 1 ,
wherein E (-) is a sobel filter, the image edge is extracted, x is a real image, and z is a model generation image.
9. The method for image restoration based on the context-based structure attention pyramid network of claim 1, wherein in step S5, when constructing the multi-scale decoder, the scale reconstruction loss function is as follows:
10. The method for image inpainting based on the context and structure attention pyramid network as claimed in claim 1, wherein in step S7, an image inpainting model is constructed and trained, and a global loss function is used in the training as follows:
L=α 1 L m +α 2 L adv +α 3 L edge +α 4 L perc +α 5 L style ,
wherein L is m Is a multi-scale reconstruction loss function, L adv Is a resistance training loss function, L edge Is an edge-preserving loss function, L perc Is a perceptual loss function, L style Is a style loss function, alpha 1 、α 2 、α 3 、α 4 、α 5 Are corresponding parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211664365.5A CN115829880A (en) | 2022-12-23 | 2022-12-23 | Image restoration method based on context structure attention pyramid network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211664365.5A CN115829880A (en) | 2022-12-23 | 2022-12-23 | Image restoration method based on context structure attention pyramid network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115829880A true CN115829880A (en) | 2023-03-21 |
Family
ID=85518009
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211664365.5A Pending CN115829880A (en) | 2022-12-23 | 2022-12-23 | Image restoration method based on context structure attention pyramid network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115829880A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116258652A (en) * | 2023-05-11 | 2023-06-13 | 四川大学 | Text image restoration model and method based on structure attention and text perception |
CN116523985A (en) * | 2023-05-06 | 2023-08-01 | 兰州交通大学 | Structure and texture feature guided double-encoder image restoration method |
-
2022
- 2022-12-23 CN CN202211664365.5A patent/CN115829880A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116523985A (en) * | 2023-05-06 | 2023-08-01 | 兰州交通大学 | Structure and texture feature guided double-encoder image restoration method |
CN116523985B (en) * | 2023-05-06 | 2024-01-02 | 兰州交通大学 | Structure and texture feature guided double-encoder image restoration method |
CN116258652A (en) * | 2023-05-11 | 2023-06-13 | 四川大学 | Text image restoration model and method based on structure attention and text perception |
CN116258652B (en) * | 2023-05-11 | 2023-07-21 | 四川大学 | Text image restoration model and method based on structure attention and text perception |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113240580B (en) | Lightweight image super-resolution reconstruction method based on multi-dimensional knowledge distillation | |
CN111784602B (en) | Method for generating countermeasure network for image restoration | |
CN108460746B (en) | Image restoration method based on structure and texture layered prediction | |
CN109447907B (en) | Single image enhancement method based on full convolution neural network | |
Zhang et al. | Semantic image inpainting with progressive generative networks | |
CN113240613B (en) | Image restoration method based on edge information reconstruction | |
CN115829880A (en) | Image restoration method based on context structure attention pyramid network | |
CN111127346A (en) | Multi-level image restoration method based on partial-to-integral attention mechanism | |
CN114463209B (en) | Image restoration method based on deep multi-feature collaborative learning | |
CN110689495A (en) | Image restoration method for deep learning | |
CN113469906B (en) | Cross-layer global and local perception network method for image restoration | |
CN113284100A (en) | Image quality evaluation method based on recovery image to mixed domain attention mechanism | |
CN111833261A (en) | Image super-resolution restoration method for generating countermeasure network based on attention | |
CN114897694A (en) | Image super-resolution reconstruction method based on mixed attention and double-layer supervision | |
CN114155171A (en) | Image restoration method and system based on intensive multi-scale fusion | |
CN115018705A (en) | Image super-resolution method based on enhanced generation countermeasure network | |
CN113989140A (en) | Image restoration method based on cycle feature reasoning of self-attention mechanism | |
CN116523985B (en) | Structure and texture feature guided double-encoder image restoration method | |
CN113487512A (en) | Digital image restoration method and device based on edge information guidance | |
CN109559278A (en) | Super resolution image reconstruction method and system based on multiple features study | |
CN115035170B (en) | Image restoration method based on global texture and structure | |
CN116051407A (en) | Image restoration method | |
CN116109510A (en) | Face image restoration method based on structure and texture dual generation | |
CN115797181A (en) | Image super-resolution reconstruction method for mine fuzzy environment | |
CN114299193B (en) | Black-white video coloring method, system, equipment and storage medium based on neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |