CN114463209A - Image restoration method based on deep multi-feature collaborative learning - Google Patents

Image restoration method based on deep multi-feature collaborative learning Download PDF

Info

Publication number
CN114463209A
CN114463209A CN202210089664.4A CN202210089664A CN114463209A CN 114463209 A CN114463209 A CN 114463209A CN 202210089664 A CN202210089664 A CN 202210089664A CN 114463209 A CN114463209 A CN 114463209A
Authority
CN
China
Prior art keywords
feature
image
texture
features
cte
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210089664.4A
Other languages
Chinese (zh)
Other versions
CN114463209B (en
Inventor
王员根
林嘉裕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN202210089664.4A priority Critical patent/CN114463209B/en
Publication of CN114463209A publication Critical patent/CN114463209A/en
Application granted granted Critical
Publication of CN114463209B publication Critical patent/CN114463209B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of image processing, in particular to an image restoration method based on depth multi-feature collaborative learning, which comprises the following steps: s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network coding to form an effective image feature set; s2, decoding and repairing the effective image feature set through a preset image decoder, and forming a repaired image through a local discriminator and a global discriminator; the image feature encoder consists of six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features, and three deep convolutional layers are used for reorganizing structural features to obtain a structural feature set and a texture feature set; the image decoder comprises a soft gate control dual-feature fusion module used for fusing the structural features and the texture features, and a bilateral propagation feature aggregation module used for balancing the features among channel information, context attention and feature space. The technology can effectively solve the artifact of the repaired image, so that the repaired image has detailed texture and better image appearance.

Description

Image restoration method based on deep multi-feature collaborative learning
Technical Field
The invention relates to the field of image processing, in particular to an image restoration method based on depth multi-feature collaborative learning.
Background
With the advancement of information technology and the advent of the digital age, digital images have been widely present in human life as carriers for recording and transferring image data, and have grown at an alarming rate. However, digital images are often damaged during capture, storage, processing and transmission or the integrity of the information stored in the images is lost due to occlusion. In order to retrieve the lost part of the damaged digital image information, the current technology can reasonably restore the lost digital image information according to the relevant characteristics of the information in the current image data, namely, the lost digital image information is restored as much as possible according to the image information which is not damaged or shielded, and the technology is commonly called image restoration technology.
Image restoration aims at reconstructing a damaged area or removing an unnecessary area in an image while improving its visual aesthetic sense, and is widely used for low-level visual tasks such as restoring a damaged photograph or removing a target area, and the current conventional restoration methods are classified into a diffusion-based method and a block-based method.
For example, a feature equalization-based inter codec restoration method proposed by liu rainbow rain, the technology proposes an inter codec using deep and shallow convolutional feature layers as the structure and texture of an image, respectively. The deep features are sent to the structural branches and the shallow features are sent to the texture branches. In each branch, the holes are filled with a plurality of sizes. And connecting the characteristics from the two branches to perform channel equalization and characteristic equalization. The channel equalization adopted by the technology adopts a compression and activation network (SEnet), and uses a bilateral propagation activation function to balance the attention of the channel again on the characteristic equalization so as to realize the space equalization. And finally, generating an output image in a jump connection mode.
The technology provides a two-stage network image restoration algorithm based on bidirectional cascade edge detection network (BDCN) and U-net incomplete edge generation. In the first stage, image edge information is extracted based on a BDCN network to replace a Canny operator to extract the edge of a residual region, each layer of network learns the edge characteristics of a specific scale, multi-scale edge characteristics are obtained through fusion, then the edge characteristics of the residual image are extracted by using a contraction path based on a U-net network architecture, and then the image edge texture information is restored by using an expansion path. In the second stage, cavity convolution is used for lower sampling and upper sampling, and a missing image with rich details is reconstructed through a residual error network.
A cascade generation-based confrontation network image restoration algorithm proposed by He is formed by connecting coarsening and optimization generation sub-networks in series. A parallel convolution module is designed in a coarsening generation network and is formed by connecting 3 layers of convolution paths and 1 deep layer convolution path in parallel, and when the number of the convolution layers is deep, the problem of gradient disappearance can be solved; a cascade residual module is provided in a deep convolution path, and the characteristic multiplexing can be effectively enhanced by performing cross cascade on the double-layer convolution of 4 channels; and correspondingly adding the convolution result and the element of the module input characteristic diagram, and performing local residual learning to improve the expression capability of the network.
The existing diffusion-based method propagates appearance information of adjacent content to fill in missing areas, and only relies on a search mechanism on the adjacent content, so that obvious artifacts are generated when a large-area defective picture is repaired. Block-based methods fill missing regions by searching for the most similar blocks from the undamaged region, which, although has the advantage of obtaining distant information, is difficult to generate semantically reasonable images due to the lack of high-level structural understanding. With the progress of the technology, although the method based on deep learning can understand high-level semantics to generate reasonable content, due to the lack of an effective multi-feature fusion technology, the actual repairing effect of the existing image repairing method is still not natural and perfect.
Disclosure of Invention
The invention provides an image restoration method based on depth multi-feature collaborative learning, aiming at the technical problems of artifacts, unnatural structures and textures and the like in the existing image restoration technology.
The image restoration method based on the depth multi-feature collaborative learning comprises the following steps:
s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network encoding to form an effective image feature set;
s2, decoding and repairing the effective image characteristic set through a preset image decoder, and forming a repaired image after passing through a local discriminator and a global discriminator;
the image feature encoder consists of six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features to represent image details, and three deep convolutional layers are used for reorganizing structural features to represent image semantics to obtain a structural feature set and a texture feature set;
the image decoder comprises a soft gate control dual-feature fusion module used for fusing the structural features and the texture features, and a double-side propagation feature aggregation module used for balancing the features among channel information, context attention and feature space.
Preferably, the texture feature and the structural feature are first filled in the damaged area by using three parallel streams with different kernel sizes, the three streams are combined to form an output feature map, and then the output feature map is mapped to the same size of the input feature.
Further, the output of the structural features and the texture features meets the following requirements:
Lrst=||g(Fcst)-Ist||1 (1-1)
Lrte=||g(Fcte)-Igt||1 (1-2)
wherein, FcstAnd FcteRespectively expressed as output characteristics, L, of the structure and texture resulting from the concatenation of the multi-scale filling stagesrstAnd LrteDenoted as reconstruction loss of structure and texture, respectively, g (-) is a convolution operation with a kernel size of 1, F can be expressedcstAnd FcteRespectively mapped as color images, IgtAnd IstRepresenting a real image and its structural image, respectively, using an edge preserving image smoothing method to generate Ist
Preferably, the soft-gated dual feature fusion module comprises a structure-guided texture feature unit for executing an algorithm,
Gte=σ(SE(h([Fcst,Fcte]))) (2-1)
F′cte=α(β(Gte⊙Fcte)⊙Fcte)⊕Fcte (2-2)
wherein, FcstAnd FcteRespectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, h (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, GteIs used to control the degree of refinement, F ', of texture information'cteIndicating a texture feature with structure perception, alpha and beta are learnable parameters, indicating an element to element product, and ^ indicating an element to element addition.
Preferably, the soft-gated dual feature fusion module comprises a texture-guided structural feature unit for executing an algorithm,
Gst=σ(SE(k([Fcst,Fcte]))) (2-3)
Figure BDA0003488754240000041
wherein, FcstAnd FcteRespectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, k (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, GstTo control the degree of refinement, F ', of the structural information'cstIndicating a texture feature with structure perception, gamma is a learnable parameter, indicating an element to element product, and ^ indicating an element to element addition.
Ffu=v([F′cst,F′cte]) (2-5)
Wherein, F'cteAnd F'cstRespectively representing texture features with structure perception and texture features with structure perception, v (-) is kernel largeConvolution operations as small as 1, FfuIs the final output characteristic of the soft gating dual-characteristic fusion module.
Preferably, the bilateral propagation feature aggregation module includes a capture channel information fusion unit, which captures channel information by an adaptive core selection method using a dynamic core selection network to obtain a feature map F'fu
Further, the bilateral propagation feature aggregation module includes a context attention fusion unit, configured to capture a relationship between input image blocks, and calculate a cosine similarity, and specifically execute the following algorithm:
Figure BDA0003488754240000042
Figure BDA0003488754240000043
Figure DEST_PATH_FDA0003530322010000033
wherein, feature F'fuDivided into non-overlapping blocks (pixels of size 3 x 3),
Figure BDA0003488754240000045
representing the cosine similarity between the output feature blocks,
Figure BDA0003488754240000046
denotes the attention score, p, obtained by the Softmax functioniAnd pjAre respectively the ith and jth blocks of the input feature F, N being the input feature F'fuThe total number of blocks of (a) is,
Figure DEST_PATH_FDA0003530322010000036
a feature map reconstructed from the attention scores is shown.
Preferably, the bilateral propagation feature aggregation module includes a spatial information fusion unit, and specifically executes the following algorithm:
Figure BDA0003488754240000052
Figure BDA0003488754240000053
wherein the content of the first and second substances,
Figure BDA0003488754240000054
and
Figure BDA0003488754240000055
representing spatial and range similarity profiles, xiIs a characteristic of input
Figure 855755DEST_PATH_FDA0003530322010000036
The ith characteristic channel of (1)jAre adjacent feature channels at locations j around channel i,
Figure BDA0003488754240000057
is a Gaussian function for adjusting the spatial contributions from neighboring feature channels, C (x) is
Figure 115835DEST_PATH_FDA0003530322010000036
F (-) is a dot product operation.
Further, the calculation method of the output characteristic channel comprises the following steps:
Figure BDA0003488754240000059
wherein the content of the first and second substances,
Figure BDA00034887542400000510
and
Figure BDA00034887542400000511
the spatial and range similarity feature maps are shown, q represents the convolutional layer, and the kernel size is 1. Further, each channel feature is aggregated to obtain a reconstructed feature map
Figure 772895DEST_PATH_FDA0003530322010000036
F 'is then formed by concatenated convolution'fuAnd
Figure 83791DEST_PATH_FDA0003530322010000036
fusion to give Fsc
Figure DEST_PATH_FDA0003530322010000046
Wherein
Figure 479000DEST_PATH_FDA0003530322010000036
Is a recombined multichannel character, F'fuTo weigh the features obtained after channel information, FscFor the final fusion repair feature, z is a convolution operation with a convolution kernel size of 1.
Preferably, the global and local discriminators are composed of five convolutional layers, the size of a convolutional kernel is 4, the step length is 2, and all layers except the last layer use a Leaky ReLu with a slope of 0.2.
Compared with the prior art, the image restoration method based on the depth multi-feature collaborative learning has the following beneficial effects:
compared with the prior art, the method has the advantages that not only the relation between the image structure and the texture is considered, but also the relation between the image contexts is considered. The method adopts a single-stage network, and uses double branches to respectively learn the structure and the texture of the image, so that the generated structure and the texture are more consistent. And the image structure information is fully utilized, so that the generated image structure is more reasonable, and the visual image result is more real. Specifically, the consistency of the structure and the texture is enhanced through a soft gating dual-feature fusion (SDFF) module, and the blurring and the artifacts around the hole area can be effectively reduced through a switching and recombination mode. The connection from local features to overall consistency is enhanced through a Bilateral Propagation Feature Aggregation (BPFA) module, and the connection between context attention, channel information and feature space is considered, so that the repaired image has detailed textures and better image appearance.
Drawings
The present invention is further described with reference to the accompanying drawings, but the embodiments in the drawings do not limit the present invention in any way, and for those skilled in the art, other drawings may be obtained according to the following drawings without creative efforts.
FIG. 1 is a block diagram of such a multi-feature collaborative learning network provided by the present invention;
FIG. 2 is a schematic diagram of a soft-gated dual feature fusion module;
FIG. 3 is a schematic diagram of a bilateral propagation feature aggregation module;
FIG. 4 is a comparison graph of the repairing effect of the present invention on irregular holes and the image repairing technique based on deep learning in the prior art;
FIG. 5 is a comparison graph of the repairing effect of the present invention on the central hole and the existing image repairing technology based on deep learning;
fig. 6 is a graph of the results of an image repair ablation experiment of the present invention.
Detailed Description
The image restoration method based on deep multi-feature collaborative learning provided by the present invention is further described below with reference to the accompanying drawings, and it should be noted that the technical solution and the design principle of the present invention are explained in detail below only by an optimized technical solution.
The core of the image restoration method based on deep multi-feature collaborative learning provided by the invention is to provide a multi-feature collaborative learning network for restoring damaged images. First, this patent proposes a soft-gated dual feature fusion (SDFF) module that enables the coordinated information exchange of image structure and texture, thereby enabling them to strengthen the connection between each other. Second, the patent uses a Bilateral Propagation Feature Aggregation (BPFA) module to further refine the generated structure and texture by enhancing the connection from local features to global consistency through collaborative learning of context attention, channel information, and feature space. In addition, the invention uses an end-to-end single-stage network training mode, and adopts double branches to respectively learn the image structure and the texture in a single stage, so that the image artifact can be effectively reduced and a more real image result can be generated.
Specifically, the technical overall stem model of the image inpainting method based on the deep multi-feature collaborative learning is shown in fig. 1, and the method includes the following steps: (1) the encoder consists of six convolutional layers. The three shallow features are reorganized into texture features to represent image details. Meanwhile, reorganizing the three deep features into structural features to represent image semantics; (2) two branches are adopted to learn structural and textural features respectively; (3) a soft-gated dual feature fusion module to fuse the structural and texture features generated by the two branches, see fig. 2 in particular; (4) a bilateral propagation feature aggregation module equalizes features between channel information, context awareness, and feature space, see in particular fig. 3. Specifically, the dynamic kernel selection network (SKNets) is used for selecting capture channel information through an adaptive convolution kernel, capturing context relations in an image by using a Context Awareness (CA) module, and capturing relations between space and range by using a Bilateral Propagation Activation (BPA) module; (5) finally, the decoder is given guidance information by the jump-join method, synthesizing the structural and texture branches to produce more complex images; (6) the use of local and global discriminators makes the generated image more realistic.
Specifically, the image restoration method based on the depth multi-feature collaborative learning comprises the following steps:
s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network coding to form an effective image feature set;
s2, decoding and repairing the effective image feature set through a preset image decoder, and forming a repaired image through a local discriminator and a global discriminator;
the image feature encoder consists of six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features to represent image details, and three deep convolutional layers are used for reorganizing structural features to represent image semantics to obtain a structural feature set and a texture feature set;
the image decoder comprises a soft gate control dual-feature fusion module used for fusing the structural features and the texture features, and a double-side propagation feature aggregation module used for balancing the features among channel information, context attention and feature space.
Preferably, the texture feature and the structural feature are first filled in the damaged area by using three parallel streams with different kernel sizes, the three streams are combined to form an output feature map, and then the output feature map is mapped to the same size of the input feature.
Further, the output of the structural features and the texture features meets the following requirements:
Lrst=||g(Fcst)-Ist||1 (1-1)
Lrte=||g(Fcte)-Igt||1 (1-2)
wherein, FcstAnd FcteOutput features, L, respectively expressed as structures and textures generated by the multi-scale fill-phase joinrstAnd LrteExpressed as reconstruction loss of structure and texture, respectively, g (-) is a convolution operation with a kernel size of 1, let FcstAnd FcteRespectively mapped as color images, IgtAnd IstRepresenting a real image and its structural image, respectively, using an edge preserving image smoothing method to generate Ist
Preferably, the soft-gated dual feature fusion module comprises a structure-guided texture feature unit for executing an algorithm,
Gte=σ(SE(h([Fcst,Fcte]))) (2-1)
F′cte=α(β(Gte⊙Fcte)⊙Fcst)⊕Fcte (2-2)
wherein, FcstAnd FcteRespectively expressed as output features of structure and texture generated by concatenation of the multi-scale filling stages, h (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, GteIs used to control the degree of refinement, F ', of texture information'cteIndicating a texture feature with structure perception, alpha and beta are learnable parameters, indicating an element to element product, and ^ indicating an element to element addition.
Preferably, the soft-gated dual feature fusion module comprises a texture-guided structural feature unit for executing an algorithm,
Gst=σ(SE(k([Fcst,Fcte]))) (2-3)
F′cst=γ(GsteFcte)⊕Fcst (2-4)
wherein, FcstAnd FcteRespectively, as output features of structure and texture generated by the concatenation of the multiple-scale filling stages, k (.) is a convolution operation with kernel size 3, SE (·) is a compression and activation operation to capture important channel information, σ (·) is a Sigmoid activation function, G (·)stTo control the degree of refinement, F ', of the structural information'cstIndicating a texture feature with structure perception, gamma is a learnable parameter, indicating an element to element product, and ^ indicating an element to element addition.
Ffu=v([F′cst,F′cte]) (2-5)
Wherein, F'cteAnd F'cstRepresenting texture features with structure perception and texture features with structure perception, respectively, v (-) is a convolution operation with a kernel size of 1, FfuIs the final output characteristic of the soft gating dual-characteristic fusion module.
Preferably, the bilateral propagation feature aggregation module includes a capture channel information fusion unit, which captures channel information by an adaptive core selection method using a dynamic core selection network to obtain a feature map F'fu
Further, the bilateral propagation feature aggregation module includes a context attention fusion unit, configured to capture a relationship between input image blocks, and calculate a cosine similarity, and specifically execute the following algorithm:
Figure BDA0003488754240000091
Figure BDA0003488754240000092
Figure 293373DEST_PATH_FDA0003530322010000033
wherein, feature F'fuDivided into non-overlapping blocks (pixels of size 3 x 3),
Figure BDA0003488754240000094
representing the cosine similarity between the output feature blocks,
Figure BDA0003488754240000095
denotes the attention score, p, obtained by the Softmax functioniAnd pjAre respectively an input characteristic F'fuIs an input feature F'fuThe total number of blocks of (a) is,
Figure 115835DEST_PATH_FDA0003530322010000036
a feature map reconstructed from the attention scores is shown.
Preferably, the bilateral propagation feature aggregation module includes a spatial information fusion unit, and specifically executes the following algorithm:
Figure BDA0003488754240000097
Figure BDA0003488754240000098
wherein the content of the first and second substances,
Figure BDA0003488754240000099
and
Figure BDA00034887542400000910
representing spatial and range similarity profiles, xiIs a characteristic of input
Figure 115835DEST_PATH_FDA0003530322010000036
The ith characteristic channel of (1)jAre neighboring feature channels at locations j around channel i,
Figure BDA00034887542400000912
is a Gaussian function for adjusting the spatial contributions from neighboring feature channels, C (x) is
Figure 115835DEST_PATH_FDA0003530322010000036
F (-) is a dot product operation.
Further, the calculation method of the output characteristic channel comprises the following steps:
Figure BDA00034887542400000914
wherein the content of the first and second substances,
Figure BDA0003488754240000101
and
Figure BDA0003488754240000102
the spatial and range similarity feature maps are shown, q represents the convolutional layer, and the kernel size is 1. Further, each channel feature is aggregated to obtain a reconstructed feature map
Figure 382868DEST_PATH_FDA0003530322010000036
Then pass throughConcatenate convolution to F'fuAnd
Figure 581769DEST_PATH_FDA0003530322010000036
fusion to give Fsc
Figure 657172DEST_PATH_FDA0003530322010000046
Wherein
Figure 650536DEST_PATH_FDA0003530322010000036
For recombined multichannel characteristics, FfuTo weigh the features obtained after channel information, FscFor the final fusion repair feature, z is a convolution operation with a convolution kernel size of 1.
Preferably, the global and local discriminators are composed of five convolutional layers, the size of a convolutional kernel is 4, the step length is 2, and all layers except the last layer use a Leaky ReLu with a slope of 0.2.
The following points describe the core technical process in detail:
(1) structural and texture branching
The texture feature of the shallow convolution reconstruction is denoted as FteStructural features of deep convolutional recombination are denoted as Fst. In each branch, three parallel streams are used, with different scales to fill the damaged area. Where the kernel sizes of different streams are different. Finally, by combining the output feature maps of the three streams, the combined features are then mapped to the same size of the input features. Here, FcstAnd FcteRepresented as the outputs of the texture and texture branches, respectively. To ensure that each branch focuses on structure and texture, respectively, we use two reconstruction penalties, denoted L, respectivelyrstAnd Lrte. The pixel level loss is defined as:
Lrst=||g(Fcst)-Ist||1 (1-1)
Lrte=||g(Fcte)-Igt||1 (1-2)
where g (-) is a convolution operation with a kernel size of 1, with the aim of dividing FcstAnd FcteRespectively mapped as color images. I.C. AgtAnd IstRespectively representing a real image and its structural image. Generating I using an edge preserving smoothing methodst
(2) Soft-gated dual feature fusion module
In this algorithm, the structural features F generated by the two branches arecstAnd texture feature FcteBetter combinations are made. By exchanging the two types of information, the ratio is dynamically controlled by soft gating to achieve the purpose of dynamic combination. In particular, to construct structure-guided texture features. Soft gate control GteTo control the refinement of the texture information. This is defined as:
Gte=σ(SE(h([Fcst,Fcte]))) (2-1)
where h (-) is a convolution operation with a kernel size of 3. SE (-) is a compression and activation operation to capture important channel information. σ (-) is a Sigmoid activation function, using soft gating GteThis can dynamically couple FcstIs fused into Fcte
F′cte=α(β(Gte⊙Fcte)⊙Fcst)⊕Fcte (2-2)
Where α and β are learnable parameters, indicating element multiplication, and ≧ element addition.
Likewise, texture guides feature F'cstIs defined as:
Gst=σ(SE(k([Fcst,Fcte]))) (2-3)
F′cst=γ(GsteFcte)⊕Fcst (2-4)
where k and h have the same arithmetic operation and γ is a learnable parameter.
Finally, is bonded F'cteAnd F'cstAnd generating feature F using convolution operation v having kernel size 1fu
Ffu=v([F′cst,F′cte]) (2-5)
(3) Bilateral propagation feature aggregation module
This module is proposed to re-weigh the channels and space so that the image representation is more consistent. Firstly, capturing channel information by using a dynamic core selection network in an adaptive core selection mode to obtain a feature map F'fuThe correlation between channels can be enhanced, and the consistency of the whole image can be maintained. And a Context Awareness (CA) module is introduced to capture the association between image blocks. Specifically, for a given input feature F, we extract a block of 3 × 3 pixels and calculate the cosine similarity:
Figure BDA0003488754240000111
wherein p isiAnd pjThe ith and jth blocks of the input feature F, respectively.
We use the Softmax function to obtain the attention score between each pair of blocks:
Figure BDA0003488754240000112
wherein N is an input feature F'fuTotal number of blocks. Next, the feature map is reconstructed using the attention scores:
Figure 404865DEST_PATH_FDA0003530322010000033
reconstructed feature maps
Figure 115835DEST_PATH_FDA0003530322010000036
Is obtained by directly recombining each block.
In the spatial and range domains, we introduce a Bilateral Propagation Activation (BPA) module to generate response values based on range and spatial distance. The response values were calculated as follows:
Figure BDA0003488754240000121
Figure BDA0003488754240000122
wherein xiIs a characteristic of input
Figure 141877DEST_PATH_FDA0003530322010000036
The ith characteristic channel of (1)jAre adjacent feature channels at locations j around channel i,
Figure BDA0003488754240000124
is a Gaussian function for adjusting the spatial contributions from neighboring feature channels, C (x) is
Figure 665262DEST_PATH_FDA0003530322010000036
The number of locations in (f) · is a dot product operation. In the spatial domain, we explore j in the neighborhood s for global propagation. S is set to the same size as the input features in the experiment. In the range domain, v is an adjacent region of the position i, and its size is set to 3 × 3. Therefore, we can obtain the characteristic diagram by the space and range similarity measurement method respectively
Figure BDA0003488754240000126
And
Figure BDA0003488754240000127
each feature channel can compute:
Figure BDA0003488754240000128
where q represents the convolutional layer and the kernel size is 1.
Next, each channel is aggregated to obtain a reconstructed feature map
Figure 95106DEST_PATH_FDA0003530322010000036
Finally, we concatenate then convolve F'fuAnd
Figure 336732DEST_PATH_FDA0003530322010000036
to obtain Fsc
Figure 143014DEST_PATH_FDA0003530322010000046
Where z is a convolution operation with a convolution kernel size of 1.
(4) Distinguishing device
The invention introduces global and local discriminators to ensure that the local-global image content is more consistent. It consists of five convolution layers, the convolution kernel size is 4, the step length is 2, except the last layer, all other layers use the Leaky ReLu with the slope of 0.2. In addition, spectral normalization is employed to achieve stable training.
The above is only a preferred embodiment of the present invention, and it should be noted that the above preferred embodiment should not be considered as limiting the present invention, and the protection scope of the present invention should be subject to the scope defined by the claims. It will be apparent to those skilled in the art that several modifications, substitutions, improvements and embellishments of the steps can be made without departing from the spirit and scope of the invention, and these modifications, substitutions, improvements and embellishments should also be construed as the scope of the invention.

Claims (10)

1. An image restoration method based on depth multi-feature collaborative learning comprises the following steps:
s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network encoding to form an effective image feature set;
s2, decoding and repairing the effective image characteristic set through a preset image decoder, and forming a repaired image after passing through a local discriminator and a global discriminator;
the image feature encoder is characterized by comprising six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features to represent image details, and three deep convolutional layers are used for reorganizing structural features to represent image semantics to obtain a structural feature set and a texture feature set;
the image decoder comprises a soft gate control dual-feature fusion module used for fusing the structural features and the texture features, and a double-side propagation feature aggregation module used for balancing the features among channel information, context attention and feature space.
2. An image inpainting method as claimed in claim 1, characterized in that the texture features and the texture features are first filled into the damaged area using three parallel streams with different kernel sizes, respectively, the three streams are combined to form an output feature map, and then the output feature map is mapped to the same size as the input features, and the output of the texture features and the texture features satisfies the following requirements:
Figure RE-FDA0003530322010000014
Figure RE-FDA0003530322010000011
wherein, FcstAnd FcteRespectively as output features of the structure and texture generated by the multi-scale fill phase join,
Figure RE-FDA0003530322010000012
and
Figure RE-FDA0003530322010000013
denoted as reconstruction loss of structure and texture, respectively, g (-) is a convolution operation with a kernel size of 1, F can be expressedcstAnd FcteRespectively mapped as color images, IgtAnd IstRepresenting a real image and its structural image, respectively, using an edge preserving image smoothing method to generate Ist
3. The image inpainting method of claim 1, wherein the soft-gated dual feature fusion module comprises a structure-guided texture feature unit for performing an algorithm,
Gte=σ(SE(h([Fcst,Fcte]))) (2-1)
Figure RE-FDA0003530322010000024
wherein, FcstAnd FcteRespectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, h (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, GteIs used to control the degree of refinement, F ', of texture information'cteIndicating a texture feature with structure perception, alpha and beta are learnable parameters, which indicate the corresponding product of elements,
Figure RE-FDA0003530322010000021
indicating that the elements are correspondingly added.
4. The image inpainting method of claim 1, wherein the soft-gated dual feature fusion module comprises a texture-guided structural feature unit configured to perform an algorithm,
Gst=σ(SE(k([Fcst,Fcte]))) (2-3)
Figure RE-FDA0003530322010000022
wherein, FcstAnd FcteRespectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, k (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, GstTo control the degree of refinement, F ', of the structural information'cstIndicating a structural feature with texture perception, gamma is a learnable parameter, which indicates the corresponding product of elements,
Figure RE-FDA0003530322010000023
indicating that the elements are correspondingly added.
Ffu=v([F′cst,F′cte]) (2-5)
Wherein, F'cteAnd F'cstRepresenting texture features with texture perception and texture features with texture perception, respectively, v (-) is a convolution operation with a kernel size of 1, FfuIs the final output characteristic of the soft gate control dual-characteristic fusion module.
5. The image inpainting method of claim 1, wherein the bilateral propagation feature aggregation module comprises a capture channel information fusion unit for capturing channel information by an adaptive core selection method using a dynamic core selection network to obtain the feature map F'fu
6. The image inpainting method of claim 5, wherein the bilateral propagation feature aggregation module includes a context attention fusion unit configured to capture a relationship between input image blocks and calculate a cosine similarity, and specifically executes the following algorithm:
Figure RE-FDA0003530322010000031
Figure RE-FDA0003530322010000032
Figure RE-FDA0003530322010000033
wherein, feature F'fuThe division into non-overlapping blocks is performed,
Figure RE-FDA0003530322010000034
representing the cosine similarity between the output feature blocks,
Figure RE-FDA0003530322010000035
denotes the attention score, p, obtained by the Softmax functioniAnd pjAre respectively an input characteristic F'fuIs an input feature F'fuThe total number of blocks of (a) is,
Figure RE-FDA0003530322010000036
a feature map obtained by reconstructing the combined feature block from the attention scores is shown.
7. The image inpainting method of claim 1, wherein the bilateral propagation feature aggregation module includes a spatial information fusion unit that specifically executes the following algorithm:
Figure RE-FDA0003530322010000037
Figure RE-FDA0003530322010000038
wherein the content of the first and second substances,
Figure RE-FDA0003530322010000039
and
Figure RE-FDA00035303220100000310
representing spatial and range similarity profiles, xiIs a characteristic of input
Figure RE-FDA00035303220100000311
The ith characteristic channel of (1)jIs the adjacent characteristic channel, g, at a position j around channel iαsIs a Gaussian function for adjusting the spatial contributions from neighboring feature channels, C (x) is
Figure RE-FDA00035303220100000312
F (-) is a dot product operation. In the spatial domain, j is explored in the neighborhood s for global propagation. In the range domain, v is an adjacent area of the position i, and the size thereof is set to 3 × 3.
8. The image inpainting method of claim 7, wherein the output feature channel is calculated by:
Figure RE-FDA0003530322010000041
wherein the content of the first and second substances,
Figure RE-FDA0003530322010000042
and
Figure RE-FDA0003530322010000043
the spatial and range similarity feature maps are shown, q represents the convolution layer, and the kernel size is 1.
9. An image inpainting method as claimed in claim 8, wherein each channel feature is aggregated to obtain a reconstructed feature map
Figure RE-FDA0003530322010000048
F 'is then formed by concatenated convolution'fuAnd
Figure RE-FDA0003530322010000045
fusion to give Fsc
Figure RE-FDA0003530322010000046
Wherein
Figure RE-FDA0003530322010000049
For recombined multichannel characteristics, FfuTo weigh the features obtained after channel information, FscFor the final fusion repair feature, z is a convolution operation with a convolution kernel size of 1.
10. The image inpainting method of claim 1, wherein the global and local discriminators are composed of five convolutional layers, the convolutional kernel size is 4, the step size is 2, all layers except the last layer use a Leaky ReLu with a slope of 0.2, and stable training is achieved using spectral normalization.
CN202210089664.4A 2022-01-25 2022-01-25 Image restoration method based on deep multi-feature collaborative learning Active CN114463209B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210089664.4A CN114463209B (en) 2022-01-25 2022-01-25 Image restoration method based on deep multi-feature collaborative learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210089664.4A CN114463209B (en) 2022-01-25 2022-01-25 Image restoration method based on deep multi-feature collaborative learning

Publications (2)

Publication Number Publication Date
CN114463209A true CN114463209A (en) 2022-05-10
CN114463209B CN114463209B (en) 2022-12-16

Family

ID=81410572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210089664.4A Active CN114463209B (en) 2022-01-25 2022-01-25 Image restoration method based on deep multi-feature collaborative learning

Country Status (1)

Country Link
CN (1) CN114463209B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897742A (en) * 2022-06-10 2022-08-12 重庆师范大学 Image restoration method with texture and structural features fused twice
CN115082743A (en) * 2022-08-16 2022-09-20 之江实验室 Full-field digital pathological image classification system considering tumor microenvironment and construction method
CN115841625A (en) * 2023-02-23 2023-03-24 杭州电子科技大学 Remote sensing building image extraction method based on improved U-Net model
CN116681980A (en) * 2023-07-31 2023-09-01 北京建筑大学 Deep learning-based large-deletion-rate image restoration method, device and storage medium
WO2023225808A1 (en) * 2022-05-23 2023-11-30 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Learned image compress ion and decompression using long and short attention module
CN117196981A (en) * 2023-09-08 2023-12-08 兰州交通大学 Bidirectional information flow method based on texture and structure reconciliation
CN117422911A (en) * 2023-10-20 2024-01-19 哈尔滨工业大学 Collaborative learning driven multi-category full-slice digital pathological image classification system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460746A (en) * 2018-04-10 2018-08-28 武汉大学 A kind of image repair method predicted based on structure and texture layer
CN112365422A (en) * 2020-11-17 2021-02-12 重庆邮电大学 Irregular missing image restoration method and system based on deep aggregation network
US20210125313A1 (en) * 2019-10-25 2021-04-29 Samsung Electronics Co., Ltd. Image processing method, apparatus, electronic device and computer readable storage medium
CN113298733A (en) * 2021-06-09 2021-08-24 华南理工大学 Implicit edge prior based scale progressive image completion method
WO2021232589A1 (en) * 2020-05-21 2021-11-25 平安国际智慧城市科技股份有限公司 Intention identification method, apparatus and device based on attention mechanism, and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460746A (en) * 2018-04-10 2018-08-28 武汉大学 A kind of image repair method predicted based on structure and texture layer
US20210125313A1 (en) * 2019-10-25 2021-04-29 Samsung Electronics Co., Ltd. Image processing method, apparatus, electronic device and computer readable storage medium
WO2021232589A1 (en) * 2020-05-21 2021-11-25 平安国际智慧城市科技股份有限公司 Intention identification method, apparatus and device based on attention mechanism, and storage medium
CN112365422A (en) * 2020-11-17 2021-02-12 重庆邮电大学 Irregular missing image restoration method and system based on deep aggregation network
CN113298733A (en) * 2021-06-09 2021-08-24 华南理工大学 Implicit edge prior based scale progressive image completion method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIANG LI等: "Selective Kernel Networks", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
XIEFAN GUO等: "Image Inpainting via Conditional Texture and Structure Dual Generation", 《2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *
宋中山等: "基于双向门控尺度特征融合的遥感场景分类", 《计算机应用》 *
范春奇等: "基于深度学习的数字图像修复算法最新进展", 《信号处理》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023225808A1 (en) * 2022-05-23 2023-11-30 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Learned image compress ion and decompression using long and short attention module
CN114897742A (en) * 2022-06-10 2022-08-12 重庆师范大学 Image restoration method with texture and structural features fused twice
CN115082743A (en) * 2022-08-16 2022-09-20 之江实验室 Full-field digital pathological image classification system considering tumor microenvironment and construction method
CN115082743B (en) * 2022-08-16 2022-12-06 之江实验室 Full-field digital pathological image classification system considering tumor microenvironment and construction method
CN115841625A (en) * 2023-02-23 2023-03-24 杭州电子科技大学 Remote sensing building image extraction method based on improved U-Net model
CN116681980A (en) * 2023-07-31 2023-09-01 北京建筑大学 Deep learning-based large-deletion-rate image restoration method, device and storage medium
CN116681980B (en) * 2023-07-31 2023-10-20 北京建筑大学 Deep learning-based large-deletion-rate image restoration method, device and storage medium
CN117196981A (en) * 2023-09-08 2023-12-08 兰州交通大学 Bidirectional information flow method based on texture and structure reconciliation
CN117196981B (en) * 2023-09-08 2024-04-26 兰州交通大学 Bidirectional information flow method based on texture and structure reconciliation
CN117422911A (en) * 2023-10-20 2024-01-19 哈尔滨工业大学 Collaborative learning driven multi-category full-slice digital pathological image classification system
CN117422911B (en) * 2023-10-20 2024-04-30 哈尔滨工业大学 Collaborative learning driven multi-category full-slice digital pathological image classification system

Also Published As

Publication number Publication date
CN114463209B (en) 2022-12-16

Similar Documents

Publication Publication Date Title
CN114463209B (en) Image restoration method based on deep multi-feature collaborative learning
CN109447907B (en) Single image enhancement method based on full convolution neural network
CN111242238B (en) RGB-D image saliency target acquisition method
CN110689495B (en) Image restoration method for deep learning
CN111787187B (en) Method, system and terminal for repairing video by utilizing deep convolutional neural network
CN110223251B (en) Convolution neural network underwater image restoration method suitable for artificial and natural light sources
CN112991231B (en) Single-image super-image and perception image enhancement joint task learning system
CN114897742B (en) Image restoration method with texture and structural features fused twice
CN110349087A (en) RGB-D image superior quality grid generation method based on adaptability convolution
CN112422870B (en) Deep learning video frame insertion method based on knowledge distillation
CN114820341A (en) Image blind denoising method and system based on enhanced transform
CN113989129A (en) Image restoration method based on gating and context attention mechanism
CN115239564B (en) Mine image super-resolution reconstruction method combining semantic information
CN115170915A (en) Infrared and visible light image fusion method based on end-to-end attention network
CN116958534A (en) Image processing method, training method of image processing model and related device
CN116485741A (en) No-reference image quality evaluation method, system, electronic equipment and storage medium
CN115829876A (en) Real degraded image blind restoration method based on cross attention mechanism
CN115829880A (en) Image restoration method based on context structure attention pyramid network
CN116109510A (en) Face image restoration method based on structure and texture dual generation
CN112785502A (en) Light field image super-resolution method of hybrid camera based on texture migration
CN116523985A (en) Structure and texture feature guided double-encoder image restoration method
CN115035170A (en) Image restoration method based on global texture and structure
CN116167920A (en) Image compression and reconstruction method based on super-resolution and priori knowledge
JPS62131383A (en) Method and apparatus for evaluating movement of image train
CN114820316A (en) Video image super-resolution recovery system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant