CN116523985A - Structure and texture feature guided double-encoder image restoration method - Google Patents

Structure and texture feature guided double-encoder image restoration method Download PDF

Info

Publication number
CN116523985A
CN116523985A CN202310501736.6A CN202310501736A CN116523985A CN 116523985 A CN116523985 A CN 116523985A CN 202310501736 A CN202310501736 A CN 202310501736A CN 116523985 A CN116523985 A CN 116523985A
Authority
CN
China
Prior art keywords
image
representing
features
network
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310501736.6A
Other languages
Chinese (zh)
Other versions
CN116523985B (en
Inventor
张家骏
廉敬
刘津颖
刘冀钊
张怀堃
董子龙
郑礼
汤春阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lanzhou Jiaotong University
Original Assignee
Lanzhou Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lanzhou Jiaotong University filed Critical Lanzhou Jiaotong University
Priority to CN202310501736.6A priority Critical patent/CN116523985B/en
Publication of CN116523985A publication Critical patent/CN116523985A/en
Application granted granted Critical
Publication of CN116523985B publication Critical patent/CN116523985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a double-encoder image restoration method guided by structural and texture features, belongs to the technical field of image restoration, and provides a double-encoder coarse restoration network guided by the structural features and the texture features and a fine restoration network based on a long-short period attention mechanism and a multi-scale receptive field, so that the combined restoration of the structure and the texture of a visual field defect image is realized.

Description

Structure and texture feature guided double-encoder image restoration method
Technical Field
The invention relates to the technical field of image restoration, in particular to a double-encoder image restoration method guided by structure and texture features.
Background
The purpose of the visual field defect image restoration is to restore the mask area in the digital image, fill the mask area with reasonable and vivid content and correct context semantics, restore the panorama and improve the picture texture. It is an important task in computer vision that can be used as an image editing tool to remove unwanted objects and restore defective images, and early image restoration methods have been mainly diffusion-based and block-based. Diffusion-based methods utilize thermal diffusion equations in physics to propagate information around the area to be repaired into the repair area through partial differential equations and variational principles. This method is only applicable to small scale defect repair in images. The block-based method comprises the steps of firstly selecting a pixel point from the boundary of a region to be repaired, taking the pixel point as a center, selecting a texture block with proper size according to the texture characteristics of an image, and then searching a texture matching block closest to the region to be repaired around the region to be repaired to replace the texture block.
However, when the key region and the important structure are defective, the method is not applicable any more, and with the continuous development of the deep learning technology, the Convolutional Neural Network (CNN) and the generated countermeasure network (GAN) based restoration method are widely applied, so that an effective tool is provided for image restoration. The existing image restoration method generally adopts a coder-decoder to extract the structure, texture and context semantics of the image, and then completes the reasonable restoration task on the visual sense of the defective image by means of generating an countermeasure network.
While existing methods can generate realistic and semantically trusted structures and textures of content within the mask region, typically either a single codec is used for repair, or two codecs are used for repair separately, ignoring the association between image structure and texture, which results in insufficient or unmatched structural expression of the texture of the image. The image generation process lacks guidance for joint extraction of image structures and texture features. Therefore, the invention provides a double-encoder coarse repair network guided by structural features and texture features and a fine repair network based on a long-short-period attention mechanism and a multi-scale receptive field, which realize the joint repair of the image structure and texture of the visual field defect.
Disclosure of Invention
The present invention aims to solve the above-mentioned problems, and to provide a structure and texture feature guided dual encoder image restoration method.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: the double-encoder image restoration method comprises a coarse restoration network and a fine restoration network, wherein the implementation steps of the coarse restoration network and the fine restoration network are as follows:
s1: the defect image to be repaired and the binary mask (mask=1) image are taken as the input of a structural encoder and a texture encoder together, and after the two encoders extract image features layer by layer, the structural distribution feature data and the texture distribution feature data are fitted by two Gaussian distribution functions N (0, 1) which are input externally.
S2: the structural feature space and the texture feature space are mapped to the potential space by the cross-semantic attention module, and the decoder recovers the image mask region from the potential space random samples.
S3: in the process of feature extraction, pyramid fusion is carried out on features extracted by the structural encoder and the texture encoder, and the fused feature space is used for guiding image restoration of a decoder to obtain a rough restoration result.
S4: using fused images I fuse And mask imageM together serve as input to the fine repair network.
S5: removing artifacts of a fused image mask region of a thin repair network by using a residual gate convolution network, and extracting image characteristic information;
s6: 3 different receptive fields are designed, useful local features and structural features are automatically filtered, interference of useless detail features in an image is effectively reduced, and the sizes of the 3 different receptive fields are respectively 3×3, 5×5 and 7×7;
s7: and a long-short term attention module is added to solve the problems of fuzzy areas and inconsistent context semantics in the image. In the long-period attention module, the attention weight matrix can be used for linking the decoding characteristic control space context, acquiring the coding characteristic of the fine repair network and completing the repair of the mask area by combining the network decoding characteristic and the coding characteristic.
S8: the decoder captures remote features by linking remote spatial contexts, maintains global semantic consistency of the image, selects finer granularity features and valid semantic features of the encoded features according to local features of the repair image, and gradually reconstructs the image with short-term and long-term attention scores to obtain a fine repair image with high resolution characteristics.
Further, the fitting of the two distributions in S1 adopts a Kullback-Leibler Divergence (KL) divergence;
the KL divergence is used for regularizing the learned importance sampling function, and the sampling function is constrained on a potential prior;
the potential a priori distribution is defined as a gaussian distribution, where the KL regularization of the structural and texture encoders is as follows:
wherein I is m Representing a damaged image; z represents potential spaceThe method is characterized in that the method is a compressed data space corresponding to structural features and texture features, and the distance between similar data points in potential space is smaller; q ψ Andimportance sampling functions of image structure distribution and texture distribution respectively; n (0,I) represents a Gaussian distribution; l (L) S KL A KL divergence loss function representing a structural feature; l (L) T KL KL divergence loss function representing texture features.
Further, the cross-semantic attention module in S2 is placed after the dual encoder module, the structural encoder feature space F S And texture encoder feature space F T The two feature spaces are mapped to potential spaces by a 1x1 convolution filter. The cross-semantic attention module calculates the attention of the two feature spaces to obtain the attention scores of the two feature spaces.
Wherein the method comprises the steps of
In the formulas (3) - (4), β j,i Indicating the extent to which the model focuses on the ith location when synthesizing the jth region. N represents the number of roughly restored image pixels; s is(s) ij Is Q in the cross-attention module T And K multiplied by each other; q (F) T ) Representing image texture features; k (F) s ) Representing the image structural features. The calculation formula of the output O of the cross semantic attention module can be finally obtained as follows:
O=αF ST +F S (6)
wherein the method comprises the steps of
V(F S )=W v F S (7)
In the formulas (5) - (7), F ST Represents an attention score; v (F) S ) Representing structural features of the image to be calculated, W v Is a 1x1 convolution filter, alpha is a balance F ST And F S The initial value of the learnable scale parameter of the weight is set to 0; the cross-semantic attention network begins with learning the correlation of structural and textural features and finally extends to learning the interdependence and association of structural and textural features from feature maps.
Further, the coarse restoration result of the image in S3 is reconstructed pixel by using a Mean Absolute Error (MAE) distance:
in the formulas (8) - (9), I C out Representing a rough restoration result of the image; i g Representing a gold standard image (group Truth image); m represents a binary mask image; l (L) C hole A reconstruction loss function representing a defective image mask region; l (L) C valid Representing the reconstruction loss function of the non-masked areas of the defective image. Thus, pixel-by-pixel reconstruction loss L C r The method comprises the following steps:
in the formula (10), lambda rec To reconstruct the loss balance factor, this value is set to 20. Furthermore, for the coarse repair network in fig. 1, the LSGAN method is adopted [9] The method can make network training more stable and more flexible than the traditional GAN loss functionThe imaging is more natural, defined as follows:
in the formulas (11) - (12), D represents a discriminator of the GAN network; l (L) D Representing a countering loss function of the GAN network arbiter; e (E) Ig~pdata(Ig) A probability density function representing a gold standard image; l (L) G Representing a countermeasures loss function of the GAN network generator;representing the probability density function of the coarsely reconstructed image.
In summary, the total loss of the coarse repair network is defined as:
further, the fused image I in S4 fuse The formula of (c) is defined as follows:
I fuse =I out_m +(1-M)*I g (14)
in equation (14), the image I is fused fuse Mask area I for coarsely restored image out_m =M×I C out Sum-gold standard image I g Is included in the image data.
Further, the attention weight matrix beta in S7 j,i The calculation formula of (2) is as follows:
wherein the method comprises the steps of
s ij =Q(f di ) T K(f dj ) (16)
In the formulas (15) - (16), β j,i Indicating the extent to which the model focuses on the ith location when synthesizing the jth region. N represents the number of pixels of the fine repair image; f (f) dj Representing decoding characteristics; s is(s) ij Is Q in the long-short period attention module T And K multiplied by each other; k (f) dj ) Representing the input information corresponding to the decoding characteristics. Q (f) di ) T Representing query vectors corresponding to decoded features
Q(f di ) T =(f di ) T =W q f di (17)
In the formula (17), W q Is a 1x1 convolution filter. The self-attention layer formula of the long-short period attention module is expressed as follows:
V D (f dj ) Representing input information corresponding to decoding characteristics to be calculated; in order to realize the combination of the fine granularity characteristic of the encoder and the characteristic of the decoder, the encoder layer and the decoder layer in the global refinement network are connected by adopting jump connection, and the characteristic of the encoder layer and the attention weight matrix beta are used for realizing the combination of the fine granularity characteristic of the encoder and the characteristic of the decoder j,i Score calculation of (2) to obtain remote spatial context characteristics, long-short distance attention layer output F out The calculation formula of (2) is as follows:
in the formula (19), V E (f ei ) Representing input information corresponding to the coding feature to be calculated. The calculation formula of the output O of the whole long-short period attention module is as follows:
O=γ(1-M)F out +Mf e (20)
wherein f e Coding features representing a remote space; m represents a binary mask; gamma is equilibrium F out And f e A learnable scale parameter of the weights in between.
Further, setting the first training target of the refinement network in the step S7 as a reconstruction loss L R r As with the reconstruction loss setting in the coarse repair network, the MAE was used for pixel-by-pixel reconstruction:
in the formulas (21) - (22), I R out Representing a thin repair result of the image; l (L) R hole A reconstruction loss function representing a fused image mask region; l (L) R valid Representing the reconstruction loss function of the non-masked regions of the fused image. The invention also adds in the perception loss [10] And loss of style [11] And performing feature extraction on the image by using the trained VGG-16 network, and calculating the loss of the two in the spatial features. Perception loss L R per The definition is as follows:
in formula (23), F i And (5) representing the i-th layer characteristic diagram in the pretrained VGG-16 network. Style loss L R syle The definition is as follows:
wherein G is i A gram matrix is represented, representing the covariance matrix between features and the correlation between each feature. In summary, the overall loss L of the global refinement network R The method comprises the following steps:
wherein lambda is rec 、λ p And lambda (lambda) s Are balance factors.
Compared with the prior art, the invention has the following beneficial effects:
(1) The dual encoder coarsely repairs a model framework of network extraction structural features and texture features;
(2) A method and a technical route for a dual-encoder coarse restoration network to guide a decoder to carry out image restoration;
(3) A fine repair network model architecture based on a long-short-term attention mechanism and a multi-scale receptive field is connected with algorithm parameter setting of a remote space context.
Drawings
FIG. 1 is a flow chart of a dual encoder generated image restoration method of the present invention;
FIG. 2 is a schematic diagram of a cross-semantic attention module of the present invention;
FIG. 3 is a schematic diagram of a long-short term attention module according to the present invention;
fig. 4 is a visual effect contrast chart of six image restoration methods of the present invention.
Detailed Description
The invention is further described in connection with the following detailed description, in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the invention easy to understand.
The technical scheme of the invention is that a flow chart of a double-encoder image restoration method is shown in figure 1. The dual encoder image restoration method includes two steps: a coarse repair network implementation step and a fine repair network implementation step. The training targets of the coarse restoration network comprise image data characteristic distribution regularization, image reconstruction loss and network countermeasure loss.
The rough repair network comprises the following steps:
(1) the defect image to be repaired and the binary mask (mask=1) image are taken as the input of a structural encoder and a texture encoder together, and after the two encoders extract image features layer by layer, the structural distribution feature data and the texture distribution feature data are fitted by two Gaussian distribution functions N (0, 1) which are input externally. The fit of both distributions uses a Kullback-Leibler Divergence (KL) divergence. The KL divergence is used to regularize the learned importance sampling function, constraining the sampling function to a potential prior. The potential a priori distribution is defined as a gaussian distribution, where the KL regularization of the structural and texture encoders is as follows:
wherein I is m Representing a damaged image; z represents the potential space, which is the compressed data space corresponding to the structural features and the texture features, and the similar data points are smaller in distance in the potential space; q ψ Andimportance sampling functions of image structure distribution and texture distribution respectively; n (0,I) represents a Gaussian distribution; l (L) S KL A KL divergence loss function representing a structural feature; l (L) T KL KL divergence loss function representing texture features.
(2) The structural feature space and the texture feature space are mapped to the potential space by a cross-semantic attention module (cross-semantic attention module is shown in fig. 2) from which the encoder randomly samples the restored image mask region.
In FIG. 2, the cross-semantic attention module is placed after the dual encoder module, structural encoder feature space F S And texture encoder feature space F T The two feature spaces are mapped to potential spaces by a 1x1 convolution filter. The cross-semantic attention module calculates the attention of the two feature spaces to obtain the attention scores of the two feature spaces.
Wherein the method comprises the steps of
s ij =Q(F T ) T K(F S ) (4)
In the formulas (3) - (4), β j,i Indicating the extent to which the model focuses on the ith location when synthesizing the jth region. N represents the number of roughly restored image pixels; s is(s) ij Is Q in the cross-attention module T And K multiplied by each other; q (F) T ) Representing image texture features; k (F) s ) Representing the image structural features. The calculation formula of the output O of the cross semantic attention module can be finally obtained as follows:
O=αF ST +F S (6)
wherein the method comprises the steps of
V(F S )=W v F S (7)
In the formulas (5) - (7), F ST Represents an attention score; v (F) S ) Representing structural features of the image to be calculated, W v Is a 1x1 convolution filter, alpha is a balance F ST And F S The initial value of the learnable scale parameter of the weight is set to 0; the cross-semantic attention network begins with learning the correlation of structural and textural features and finally extends to learning the interdependence and association of structural and textural features from feature maps.
(3) In the process of extracting the characteristics, the characteristics extracted by the structure encoder and the texture encoder are subjected to gold word
Tower type fusion, wherein the fused feature space is used for guiding the image restoration of the decoder to obtain a rough restoration result. For the rough restoration result of the image, the invention adopts the Mean Absolute Error (MAE) distance to reconstruct pixel by pixel:
in the formulas (8) - (9), I C out Representing a rough restoration result of the image; i g Representing a gold standard image (group Truth image); m represents a binary mask image; l (L) C hole A reconstruction loss function representing a defective image mask region; l (L) C valid Representing the reconstruction loss function of the non-masked areas of the defective image. Thus, pixel-by-pixel reconstruction loss L C r The method comprises the following steps:
in the formula (10), lambda rec To reconstruct the loss balance factor, this value is set to 20. Furthermore, for the coarse repair network in fig. 1, the LSGAN method is adopted [9] Compared with the traditional GAN loss function, the method can enable the network training to be more stable, and the generated image is more natural, and is defined as follows:
in the formulas (11) - (12), D represents a discriminator of the GAN network; l (L) D Representing a countering loss function of the GAN network arbiter; e (E) Ig~pdata(Ig) A probability density function representing a gold standard image; l (L) G Representing a countermeasures loss function of the GAN network generator;probability density representing coarsely reconstructed imagesA degree function.
In summary, the total loss of the coarse repair network is defined as:
in formula (13), lambda KL Represents the KL divergence loss balance factor, which is set to 20. After the rough repair process is completed, rough repair result I C out The masking region of the image can be restored because of the design of gate convolution in the coarse restoration network, artifacts caused by the masking region are eliminated, but two problems still exist:
(1) the image mask area is still blurred after being repaired;
(2) the content of the image after the completion is consistent with the whole lack of semantics, and the context semantics are inconsistent.
In order to solve the problems, the invention designs a global fine restoration network, which adopts a multi-scale feature extraction mode and a long-period attention module to eliminate a fuzzy region in an image, unifies global semantics, and improves the resolution of an image mask region and the consistency of the global semantics.
The implementation steps of the double-encoder image fine restoration adopt the following algorithm:
(1) using fused images I fuse And mask image M together as input to the fine repair network. Fusion image I fuse The formula of (c) is defined as follows:
I fuse =I out_m +(1-M)*I g (14)
in equation (14), the image I is fused fuse Mask area I for coarsely restored image out_m =M×I C out Sum-gold standard image I g Is included in the image data.
(2) Removing artifacts of the fused image mask region of the fine repair network using a residual gate convolution network, extracting image feature information (in fig. 1, the residual gate convolution network is represented by blue rectangular blocks);
(3) 3 different receptive fields are designed, useful local features and structural features are automatically filtered, interference of useless detail features in an image is effectively reduced, and the sizes of the 3 different receptive fields are respectively 3×3, 5×5 and 7×7;
(4) and a long-short-term attention module is added to solve the problem of fuzzy area and the problem of inconsistent context semantics in the image (the long-short-term attention module is shown in a red rectangular block in fig. 1, and the long-short-term attention module is shown in fig. 3). In the long-period attention module, the attention weight matrix can be used for linking the decoding characteristic control space context, acquiring the coding characteristic of the fine repair network and completing the repair of the mask area by combining the network decoding characteristic and the coding characteristic. Attention weight matrix beta j,i The calculation formula of (2) is as follows:
wherein the method comprises the steps of
s ij =Q(f di ) T K(f dj ) (16)
In the formulas (15) - (16), β j,i Indicating the extent to which the model focuses on the ith location when synthesizing the jth region. N represents the number of pixels of the fine repair image; f (f) dj Representing decoding characteristics; s is(s) ij Is Q in the long-short period attention module T And K multiplied by each other; k (f) dj ) Representing the input information corresponding to the decoding characteristics. Q (f) di ) T Representing query vectors corresponding to decoded features
Q(f di ) T =(f di ) T =W q f di (17)
In the formula (17), W q Is a 1x1 convolution filter. The self-attention layer formula of the long-short period attention module is expressed as follows:
V D (f dj ) Representing input information corresponding to decoding characteristics to be calculated; to achieve encoder fine granularity featuresThe combination of decoder features, the jump connection is adopted to connect the encoder layer and the decoder layer in the global refinement network, and the attention weight matrix beta is based on the features of the encoder layer j,i Score calculation of (2) to obtain remote spatial context characteristics, long-short distance attention layer output F out The calculation formula of (2) is as follows:
in the formula (19), V E (f ei ) Representing input information corresponding to the coding feature to be calculated. The calculation formula of the output O of the whole long-short period attention module is as follows:
O=γ(1-M)F out +Mf e (20)
wherein f e Coding features representing the far space (represented in fig. 3 by orange matrix blocks); m represents a binary mask; gamma is equilibrium F out And f e A learnable scale parameter of the weights in between.
(5) The decoder captures the remote features by linking the remote spatial contexts (in the fine repair network of fig. 1, the context links are connected with orange solid lines), maintains global semantic consistency of the image, selects finer granularity features and valid semantic features of the decoded features according to the environment of the repair image, and gradually reconstructs the image with short-term and long-term attention scores, obtaining a fine repair image with high resolution characteristics.
The first training goal of the refinement network is to reconstruct the loss L R r As with the reconstruction loss setting in the coarse repair network, the MAE was used for pixel-by-pixel reconstruction:
in formula (21)In formula (22), I R out Representing a thin repair result of the image; l (L) R hole A reconstruction loss function representing a fused image mask region (as shown in fig. 1); l (L) R valid Representing the reconstruction loss function of the non-masked regions of the fused image. The invention also adds in the perception loss [10] And loss of style [11] And performing feature extraction on the image by using the trained VGG-16 network, and calculating the loss of the two in the spatial features. Perception loss L R per The definition is as follows:
in formula (23), F i And (5) representing the i-th layer characteristic diagram in the pretrained VGG-16 network. Style loss L R syle The definition is as follows:
wherein G is i A gram matrix is represented, representing the covariance matrix between features and the correlation between each feature. In summary, the overall loss L of the global refinement network R The method comprises the following steps:
wherein lambda is rec 、λ p And lambda (lambda) s Are balance factors.
Comparing the experimental data results:
the color images used in the experiments of the present invention were all from the CelebA-HQ dataset [12] . The high resolution dataset of the CelebA-HQ dataset contained 30000 face images, randomly selecting 27000 images for training and 300 images for testing.
The superior performance of the method of the present invention was verified by comparison with five other representative algorithms. The five comparison algorithms comprise GC algorithm, PIC algorithm, MEDFE algorithm, RFR algorithm and MADF algorithm. For the image quality evaluation index, several common indexes in the image restoration task are adopted: l1 error, peak signal-to-noise ratio (PSNR), structural Similarity (SSIM), fuv Lei Xiete distance (FID), and learning perceived image block similarity (LPIPS). The experimental results of the CelebA-HQ dataset are shown in Table 1 (best evaluation results are shown in bold font, second best evaluation results are shown in underline).
Table 1: celebA-HQ dataset experimental result comparison chart
As can be seen from the color image restoration results in Table 1, in the restoration process of which the mask area is the image center area, the evaluation results of the evaluation index L1 and the LPIPS are the best in six algorithms, which indicates that the error between the experimental image obtained by the method and the pixel value of the gold standard image (group Truth image) is the smallest in all the experimental methods, and the image restoration effect is the best. The evaluation results of the evaluation indexes PSNR and SSIM of the method are inferior to those obtained by the comparison algorithm GC in all experimental methods, which shows that the experimental images obtained by the method have stronger consistency with the gold standard images (group Truth images). The evaluation results of the evaluation index FID of the method are ranked second in all comparison algorithms, which shows that the experimental image obtained by the method has stronger correlation with the gold standard image.
In the restoration process of the mask area being the image random area, the evaluation results of the method evaluation indexes L1, SSIM and FID are the best of six algorithms, the error between the pixel values of the experimental image obtained by the method and the pixel values of the gold standard image (group trunk image) is the smallest in all the experimental methods, and the consistency and the relevance are the strongest, which indicates that the comprehensive restoration effect of the image is the best.
Fig. 4 shows a visual effect comparison of six image restoration methods. The experimental gallery is the CelebA-HQ dataset. Wherein the first line of images is the experimental result that the mask area is the central area of the image. The second line of images is the experimental result of the mask area being a random area. Further, the first column image is a gold standard image (group Truth image); the second column of images are defect images; the third column of images are repair images obtained by the MEDFE algorithm; the fourth column of images are repair images obtained by a GC algorithm; the fifth column of images are repair images obtained by an MADF algorithm; the sixth column of images are repair images obtained by an RFR algorithm; the seventh column of images are repair images obtained by PIC algorithm; the eighth column of images is the repair image obtained by the method.
In CelebA-HQ dataset experiments, MADF and the method of the invention can accurately repair and display the characteristics of human face eyes, nose, mouth, hair and the like. For large-area mask areas, the filling effect of other algorithms is poor, and the filling effect is mainly represented by facial feature blurring and rough texture. Compared with the repairing image obtained by the method, the facial features of the person are clearer and more natural, and the visual effect of human eyes is better.
The experimental environment used pyrach 1.8.0,python 3.6.13,GPU was NVIDIA GeForce RTX 3090. The experimental network contained 14M trainable parameters using Orthogonal Initialization and Adam optimization algorithms. The fixed learning rate of the network training is gamma=10 -4 . The balance factor is empirically set to lambda rec =20、λ kl =20、λ p =0.05、λs=100。
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims (7)

1. The double-encoder image restoration method guided by the structure and the texture features is characterized by comprising a coarse restoration network and a fine restoration network, wherein the implementation steps of the coarse restoration network and the fine restoration network are as follows:
s1: the defect image to be repaired and the binary mask (mask=1) image are taken as the input of a structural encoder and a texture encoder together, and after the two encoders extract image features layer by layer, the structural distribution feature data and the texture distribution feature data are fitted by two Gaussian distribution functions N (0, 1) which are input externally.
S2: the structural feature space and the texture feature space are mapped to the potential space by the cross-semantic attention module, and the decoder recovers the image mask region from the potential space random samples.
S3: in the process of feature extraction, pyramid fusion is carried out on features extracted by the structural encoder and the texture encoder, and the fused feature space is used for guiding image restoration of a decoder to obtain a rough restoration result.
S4: using fused images I fuse And mask image M together as input to the fine repair network.
S5: removing artifacts of a fused image mask region of a thin repair network by using a residual gate convolution network, and extracting image characteristic information;
s6: 3 different receptive fields are designed, useful local features and structural features are automatically filtered, interference of useless detail features in an image is effectively reduced, and the sizes of the 3 different receptive fields are respectively 3×3, 5×5 and 7×7;
s7: and a long-short term attention module is added to solve the problems of fuzzy areas and inconsistent context semantics in the image. In the long-period attention module, the attention weight matrix can be used for linking the decoding characteristic control space context, acquiring the coding characteristic of the fine repair network and completing the repair of the mask area by combining the network decoding characteristic and the coding characteristic.
S8: the decoder captures remote features by linking remote spatial contexts, maintains global semantic consistency of the image, selects finer granularity features and valid semantic features of the encoded features according to local features of the repair image, and gradually reconstructs the image with short-term and long-term attention scores to obtain a fine repair image with high resolution characteristics.
2. A structure and texture feature guided dual encoder image restoration method according to claim 1, wherein the fitting of the two distributions in S1 uses Kullback-Leibler Divergence (KL) divergence;
the KL divergence is used for regularizing the learned importance sampling function, and the sampling function is constrained on a potential prior;
the potential a priori distribution is defined as a gaussian distribution, where the KL regularization of the structural and texture encoders is as follows:
wherein I is m Representing a damaged image; z represents the potential space, which is the compressed data space corresponding to the structural features and the texture features, and the similar data points are smaller in distance in the potential space; q ψ Andimportance sampling functions of image structure distribution and texture distribution respectively; n (0,I) represents a Gaussian distribution; l (L) S KL A KL divergence loss function representing a structural feature; l (L) T KL KL divergence loss function representing texture features.
3. A structure and texture feature guided dual encoder image restoration method according to claim 1, characterized in that the cross semantic attention module in S2 is placed after the dual encoder module, the structure encoder feature space F S And texture encoder feature space F T The two feature spaces are mapped to potential spaces by a 1x1 convolution filter. The cross-semantic attention module calculates the attention of the two feature spaces to obtain the attention scores of the two feature spaces.
Wherein the method comprises the steps of
s ij =Q(F T ) T K(F S ) (4)
In the formulas (3) - (4), β j,i Indicating the extent to which the model focuses on the ith location when synthesizing the jth region. N represents the number of roughly restored image pixels; s is(s) ij Is Q in the cross-attention module T And K multiplied by each other; q (F) T ) Representing image texture features; k (F) s ) Representing the image structural features. The calculation formula of the output O of the cross semantic attention module can be finally obtained as follows:
O=αF ST +F S (6)
wherein the method comprises the steps of
V(F S )=W v F S (7)
In the formulas (5) - (7), F ST Represents an attention score; v (F) S ) Representing structural features of the image to be calculated, W v Is a 1x1 convolution filter, alpha is a balance F ST And F S The initial value of the learnable scale parameter of the weight is set to 0; the cross-semantic attention network begins with learning the correlation of structural and textural features and finally extends to learning the interdependence and association of structural and textural features from feature maps.
4. A structure and texture guided dual encoder image restoration method according to claim 1, characterized in that the coarse restoration of the image in S3 is a pixel-by-pixel reconstruction using Mean Absolute Error (MAE):
in the formulas (8) - (9), I C out Representing a rough restoration result of the image; i g Representing a gold standard image (group Truth image); m represents a binary mask image; l (L) C hole A reconstruction loss function representing a defective image mask region; l (L) C valid Representing the reconstruction loss function of the non-masked areas of the defective image. Thus, pixel-by-pixel reconstruction loss L C r The method comprises the following steps:
in the formula (10), lambda rec To reconstruct the loss balance factor, this value is set to 20. Furthermore, for the coarse repair network in fig. 1, the LSGAN method is adopted [9] Setting a loss function, which is more than the traditional GAN loss function, the methodThe network training can be more stable, and the generated image is more natural, and is defined as follows:
in the formulas (11) - (12), D represents a discriminator of the GAN network; l (L) D Representing a countering loss function of the GAN network arbiter; e (E) Ig~pdata(Ig) A probability density function representing a gold standard image; l (L) G Representing a countermeasures loss function of the GAN network generator;representing the probability density function of the coarsely reconstructed image.
In summary, the total loss of the coarse repair network is defined as:
5. a structure and texture feature guided dual encoder image restoration method according to claim 1, wherein said fused image I in S4 fuse The formula of (c) is defined as follows:
I fuse =I out_m +(1-M)*I g (14)
in equation (14), the image I is fused fuse Mask area I for coarsely restored image out_m =M×I C out Sum-gold standard image I g Is included in the image data.
6. A method of structure and texture feature guided dual encoder image restoration according to claim 1, wherein said method comprisesAttention weight matrix beta in S7 j,i The calculation formula of (2) is as follows:
wherein the method comprises the steps of
s ij =Q(f di ) T K(f dj ) (16)
In the formulas (15) - (16), β j,i Indicating the extent to which the model focuses on the ith location when synthesizing the jth region. N represents the number of pixels of the fine repair image; f (f) dj Representing decoding characteristics; s is(s) ij Is Q in the long-short period attention module T And K multiplied by each other; k (f) dj ) Representing the input information corresponding to the decoding characteristics. Q (f) di ) T Representing query vectors corresponding to decoded features
Q(f di ) T =(f di ) T =W q f di (17)
In the formula (17), W q Is a 1x1 convolution filter. The self-attention layer formula of the long-short period attention module is expressed as follows:
V D (f dj ) Representing input information corresponding to decoding characteristics to be calculated; in order to realize the combination of the fine granularity characteristic of the encoder and the characteristic of the decoder, the encoder layer and the decoder layer in the global refinement network are connected by adopting jump connection, and the characteristic of the encoder layer and the attention weight matrix beta are used for realizing the combination of the fine granularity characteristic of the encoder and the characteristic of the decoder j,i Score calculation of (2) to obtain remote spatial context characteristics, long-short distance attention layer output F out The calculation formula of (2) is as follows:
in the formula (19), V E (f ei ) Representing input information corresponding to the coding feature to be calculated. The calculation formula of the output O of the whole long-short period attention module is as follows:
O=γ(1-M)F out +Mf e (20)
wherein f e Coding features representing a remote space; m represents a binary mask; gamma is equilibrium F out And f e A learnable scale parameter of the weights in between.
7. A structure and texture feature guided dual encoder image restoration method according to claim 1, characterized in that the first training goal of the refinement network in S7 is set to reconstruct the loss L R r As with the reconstruction loss setting in the coarse repair network, the MAE was used for pixel-by-pixel reconstruction:
in the formulas (21) - (22), I R out Representing a thin repair result of the image; l (L) R hole A reconstruction loss function representing a fused image mask region; l (L) R valid Representing the reconstruction loss function of the non-masked regions of the fused image. The invention also adds in the perception loss [10] And loss of style [11] And performing feature extraction on the image by using the trained VGG-16 network, and calculating the loss of the two in the spatial features. Perception loss L R per The definition is as follows:
in formula (23), F i And (5) representing the i-th layer characteristic diagram in the pretrained VGG-16 network. Style loss L R syle The definition is as follows:
wherein G is i A gram matrix is represented, representing the covariance matrix between features and the correlation between each feature. In summary, the overall loss L of the global refinement network R The method comprises the following steps:
wherein lambda is rec 、λ p And lambda (lambda) s Are balance factors.
CN202310501736.6A 2023-05-06 2023-05-06 Structure and texture feature guided double-encoder image restoration method Active CN116523985B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310501736.6A CN116523985B (en) 2023-05-06 2023-05-06 Structure and texture feature guided double-encoder image restoration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310501736.6A CN116523985B (en) 2023-05-06 2023-05-06 Structure and texture feature guided double-encoder image restoration method

Publications (2)

Publication Number Publication Date
CN116523985A true CN116523985A (en) 2023-08-01
CN116523985B CN116523985B (en) 2024-01-02

Family

ID=87402696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310501736.6A Active CN116523985B (en) 2023-05-06 2023-05-06 Structure and texture feature guided double-encoder image restoration method

Country Status (1)

Country Link
CN (1) CN116523985B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196981A (en) * 2023-09-08 2023-12-08 兰州交通大学 Bidirectional information flow method based on texture and structure reconciliation

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113129234A (en) * 2021-04-20 2021-07-16 河南科技学院 Incomplete image fine repairing method based on intra-field and extra-field feature fusion
CN113837953A (en) * 2021-06-11 2021-12-24 西安工业大学 Image restoration method based on generation countermeasure network
WO2022064222A1 (en) * 2020-09-25 2022-03-31 Panakeia Technologies Limited A method of processing an image of tissue and a system for processing an image of tissue
CN114511463A (en) * 2022-02-11 2022-05-17 陕西师范大学 Digital image repairing method, device and equipment and readable storage medium
US20220187841A1 (en) * 2020-12-10 2022-06-16 AI Incorporated Method of lightweight simultaneous localization and mapping performed on a real-time computing and battery operated wheeled device
CN114973136A (en) * 2022-05-31 2022-08-30 河南工业大学 Scene image recognition method under extreme conditions
CN115731597A (en) * 2022-11-24 2023-03-03 四川轻化工大学 Automatic segmentation and restoration management platform and method for mask image of face mask
CN115829880A (en) * 2022-12-23 2023-03-21 南京信息工程大学 Image restoration method based on context structure attention pyramid network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111292265A (en) * 2020-01-22 2020-06-16 东华大学 Image restoration method based on generating type antagonistic neural network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022064222A1 (en) * 2020-09-25 2022-03-31 Panakeia Technologies Limited A method of processing an image of tissue and a system for processing an image of tissue
US20220187841A1 (en) * 2020-12-10 2022-06-16 AI Incorporated Method of lightweight simultaneous localization and mapping performed on a real-time computing and battery operated wheeled device
CN113129234A (en) * 2021-04-20 2021-07-16 河南科技学院 Incomplete image fine repairing method based on intra-field and extra-field feature fusion
CN113837953A (en) * 2021-06-11 2021-12-24 西安工业大学 Image restoration method based on generation countermeasure network
CN114511463A (en) * 2022-02-11 2022-05-17 陕西师范大学 Digital image repairing method, device and equipment and readable storage medium
CN114973136A (en) * 2022-05-31 2022-08-30 河南工业大学 Scene image recognition method under extreme conditions
CN115731597A (en) * 2022-11-24 2023-03-03 四川轻化工大学 Automatic segmentation and restoration management platform and method for mask image of face mask
CN115829880A (en) * 2022-12-23 2023-03-21 南京信息工程大学 Image restoration method based on context structure attention pyramid network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张雪菲;程乐超;白升利;张繁;孙农亮;王章野;: "基于变分自编码器的人脸图像修复", 计算机辅助设计与图形学学报, no. 03 *
李健等: "结合双编码器与对抗训练的图像修复", 计算机工程与应用, vol. 57, no. 7 *
罗海银: "图像修复方法研究综述", 计算机科学与探索, vol. 16, no. 10 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196981A (en) * 2023-09-08 2023-12-08 兰州交通大学 Bidirectional information flow method based on texture and structure reconciliation
CN117196981B (en) * 2023-09-08 2024-04-26 兰州交通大学 Bidirectional information flow method based on texture and structure reconciliation

Also Published As

Publication number Publication date
CN116523985B (en) 2024-01-02

Similar Documents

Publication Publication Date Title
CN111784602B (en) Method for generating countermeasure network for image restoration
Xiang et al. Deep learning for image inpainting: A survey
CN112541864A (en) Image restoration method based on multi-scale generation type confrontation network model
CN111787187B (en) Method, system and terminal for repairing video by utilizing deep convolutional neural network
CN114445292A (en) Multi-stage progressive underwater image enhancement method
CN114283080A (en) Multi-mode feature fusion text-guided image compression noise removal method
CN114881871A (en) Attention-fused single image rain removing method
CN116523985B (en) Structure and texture feature guided double-encoder image restoration method
Chen et al. MICU: Image super-resolution via multi-level information compensation and U-net
CN116258652B (en) Text image restoration model and method based on structure attention and text perception
CN114943656B (en) Face image restoration method and system
CN115565056A (en) Underwater image enhancement method and system based on condition generation countermeasure network
CN113469906A (en) Cross-layer global and local perception network method for image restoration
CN115731597A (en) Automatic segmentation and restoration management platform and method for mask image of face mask
Yang et al. A survey of super-resolution based on deep learning
CN116310394A (en) Saliency target detection method and device
Liu et al. Facial image inpainting using multi-level generative network
CN116109510A (en) Face image restoration method based on structure and texture dual generation
CN116703750A (en) Image defogging method and system based on edge attention and multi-order differential loss
CN116385289A (en) Progressive inscription character image restoration model and restoration method
CN116245861A (en) Cross multi-scale-based non-reference image quality evaluation method
Campana et al. Variable-hyperparameter visual transformer for efficient image inpainting
Fan et al. Image inpainting based on structural constraint and multi-scale feature fusion
CN114862696A (en) Facial image restoration method based on contour and semantic guidance
Li et al. Feature attention parallel aggregation network for single image haze removal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant