CN114463209A

CN114463209A - Image restoration method based on deep multi-feature collaborative learning

Info

Publication number: CN114463209A
Application number: CN202210089664.4A
Authority: CN
Inventors: 王员根; 林嘉裕
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-05-10
Anticipated expiration: 2042-01-25
Also published as: CN114463209B

Abstract

The invention relates to the field of image processing, in particular to an image restoration method based on depth multi-feature collaborative learning, which comprises the following steps: s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network coding to form an effective image feature set; s2, decoding and repairing the effective image feature set through a preset image decoder, and forming a repaired image through a local discriminator and a global discriminator; the image feature encoder consists of six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features, and three deep convolutional layers are used for reorganizing structural features to obtain a structural feature set and a texture feature set; the image decoder comprises a soft gate control dual-feature fusion module used for fusing the structural features and the texture features, and a bilateral propagation feature aggregation module used for balancing the features among channel information, context attention and feature space. The technology can effectively solve the artifact of the repaired image, so that the repaired image has detailed texture and better image appearance.

Description

Image restoration method based on deep multi-feature collaborative learning

Technical Field

The invention relates to the field of image processing, in particular to an image restoration method based on depth multi-feature collaborative learning.

Background

With the advancement of information technology and the advent of the digital age, digital images have been widely present in human life as carriers for recording and transferring image data, and have grown at an alarming rate. However, digital images are often damaged during capture, storage, processing and transmission or the integrity of the information stored in the images is lost due to occlusion. In order to retrieve the lost part of the damaged digital image information, the current technology can reasonably restore the lost digital image information according to the relevant characteristics of the information in the current image data, namely, the lost digital image information is restored as much as possible according to the image information which is not damaged or shielded, and the technology is commonly called image restoration technology.

Image restoration aims at reconstructing a damaged area or removing an unnecessary area in an image while improving its visual aesthetic sense, and is widely used for low-level visual tasks such as restoring a damaged photograph or removing a target area, and the current conventional restoration methods are classified into a diffusion-based method and a block-based method.

For example, a feature equalization-based inter codec restoration method proposed by liu rainbow rain, the technology proposes an inter codec using deep and shallow convolutional feature layers as the structure and texture of an image, respectively. The deep features are sent to the structural branches and the shallow features are sent to the texture branches. In each branch, the holes are filled with a plurality of sizes. And connecting the characteristics from the two branches to perform channel equalization and characteristic equalization. The channel equalization adopted by the technology adopts a compression and activation network (SEnet), and uses a bilateral propagation activation function to balance the attention of the channel again on the characteristic equalization so as to realize the space equalization. And finally, generating an output image in a jump connection mode.

The technology provides a two-stage network image restoration algorithm based on bidirectional cascade edge detection network (BDCN) and U-net incomplete edge generation. In the first stage, image edge information is extracted based on a BDCN network to replace a Canny operator to extract the edge of a residual region, each layer of network learns the edge characteristics of a specific scale, multi-scale edge characteristics are obtained through fusion, then the edge characteristics of the residual image are extracted by using a contraction path based on a U-net network architecture, and then the image edge texture information is restored by using an expansion path. In the second stage, cavity convolution is used for lower sampling and upper sampling, and a missing image with rich details is reconstructed through a residual error network.

A cascade generation-based confrontation network image restoration algorithm proposed by He is formed by connecting coarsening and optimization generation sub-networks in series. A parallel convolution module is designed in a coarsening generation network and is formed by connecting 3 layers of convolution paths and 1 deep layer convolution path in parallel, and when the number of the convolution layers is deep, the problem of gradient disappearance can be solved; a cascade residual module is provided in a deep convolution path, and the characteristic multiplexing can be effectively enhanced by performing cross cascade on the double-layer convolution of 4 channels; and correspondingly adding the convolution result and the element of the module input characteristic diagram, and performing local residual learning to improve the expression capability of the network.

The existing diffusion-based method propagates appearance information of adjacent content to fill in missing areas, and only relies on a search mechanism on the adjacent content, so that obvious artifacts are generated when a large-area defective picture is repaired. Block-based methods fill missing regions by searching for the most similar blocks from the undamaged region, which, although has the advantage of obtaining distant information, is difficult to generate semantically reasonable images due to the lack of high-level structural understanding. With the progress of the technology, although the method based on deep learning can understand high-level semantics to generate reasonable content, due to the lack of an effective multi-feature fusion technology, the actual repairing effect of the existing image repairing method is still not natural and perfect.

Disclosure of Invention

The invention provides an image restoration method based on depth multi-feature collaborative learning, aiming at the technical problems of artifacts, unnatural structures and textures and the like in the existing image restoration technology.

The image restoration method based on the depth multi-feature collaborative learning comprises the following steps:

s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network encoding to form an effective image feature set;

s2, decoding and repairing the effective image characteristic set through a preset image decoder, and forming a repaired image after passing through a local discriminator and a global discriminator;

the image feature encoder consists of six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features to represent image details, and three deep convolutional layers are used for reorganizing structural features to represent image semantics to obtain a structural feature set and a texture feature set;

the image decoder comprises a soft gate control dual-feature fusion module used for fusing the structural features and the texture features, and a double-side propagation feature aggregation module used for balancing the features among channel information, context attention and feature space.

Preferably, the texture feature and the structural feature are first filled in the damaged area by using three parallel streams with different kernel sizes, the three streams are combined to form an output feature map, and then the output feature map is mapped to the same size of the input feature.

Further, the output of the structural features and the texture features meets the following requirements:

L_rst＝||g(F_cst)-I_st||₁ (1-1)

L_rte＝||g(F_cte)-I_gt||₁ (1-2)

wherein, F_cstAnd F_cteRespectively expressed as output characteristics, L, of the structure and texture resulting from the concatenation of the multi-scale filling stages_rstAnd L_rteDenoted as reconstruction loss of structure and texture, respectively, g (-) is a convolution operation with a kernel size of 1, F can be expressed_cstAnd F_cteRespectively mapped as color images, I_gtAnd I_stRepresenting a real image and its structural image, respectively, using an edge preserving image smoothing method to generate I_st。

Preferably, the soft-gated dual feature fusion module comprises a structure-guided texture feature unit for executing an algorithm,

G_te＝σ(SE(h([F_cst,F_cte]))) (2-1)

F′_cte＝α(β(G_te⊙F_cte)⊙F_cte)⊕F_cte (2-2)

wherein, F_cstAnd F_cteRespectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, h (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G_teIs used to control the degree of refinement, F ', of texture information'_cteIndicating a texture feature with structure perception, alpha and beta are learnable parameters, indicating an element to element product, and ^ indicating an element to element addition.

Preferably, the soft-gated dual feature fusion module comprises a texture-guided structural feature unit for executing an algorithm,

G_st＝σ(SE(k([F_cst,F_cte]))) (2-3)

wherein, F_cstAnd F_cteRespectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, k (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G_stTo control the degree of refinement, F ', of the structural information'_cstIndicating a texture feature with structure perception, gamma is a learnable parameter, indicating an element to element product, and ^ indicating an element to element addition.

F_fu＝v([F′_cst,F′_cte]) (2-5)

Wherein, F'_cteAnd F'_cstRespectively representing texture features with structure perception and texture features with structure perception, v (-) is kernel largeConvolution operations as small as 1, F_fuIs the final output characteristic of the soft gating dual-characteristic fusion module.

Preferably, the bilateral propagation feature aggregation module includes a capture channel information fusion unit, which captures channel information by an adaptive core selection method using a dynamic core selection network to obtain a feature map F'_fu。

Further, the bilateral propagation feature aggregation module includes a context attention fusion unit, configured to capture a relationship between input image blocks, and calculate a cosine similarity, and specifically execute the following algorithm:

wherein, feature F'_fuDivided into non-overlapping blocks (pixels of size 3 x 3),

representing the cosine similarity between the output feature blocks,

denotes the attention score, p, obtained by the Softmax function_iAnd p_jAre respectively the ith and jth blocks of the input feature F, N being the input feature F'_fuThe total number of blocks of (a) is,

a feature map reconstructed from the attention scores is shown.

Preferably, the bilateral propagation feature aggregation module includes a spatial information fusion unit, and specifically executes the following algorithm:

wherein the content of the first and second substances,

and

representing spatial and range similarity profiles, x_iIs a characteristic of input

Figure 855755DEST_PATH_FDA0003530322010000036

The ith characteristic channel of (1)_jAre adjacent feature channels at locations j around channel i,

is a Gaussian function for adjusting the spatial contributions from neighboring feature channels, C (x) is

Figure 115835DEST_PATH_FDA0003530322010000036

F (-) is a dot product operation.

Further, the calculation method of the output characteristic channel comprises the following steps:

wherein the content of the first and second substances,

and

the spatial and range similarity feature maps are shown, q represents the convolutional layer, and the kernel size is 1. Further, each channel feature is aggregated to obtain a reconstructed feature map

Figure 772895DEST_PATH_FDA0003530322010000036

F 'is then formed by concatenated convolution'_fuAnd

Figure 83791DEST_PATH_FDA0003530322010000036

fusion to give F_sc，

Wherein

Figure 479000DEST_PATH_FDA0003530322010000036

Is a recombined multichannel character, F'_fuTo weigh the features obtained after channel information, F_scFor the final fusion repair feature, z is a convolution operation with a convolution kernel size of 1.

Preferably, the global and local discriminators are composed of five convolutional layers, the size of a convolutional kernel is 4, the step length is 2, and all layers except the last layer use a Leaky ReLu with a slope of 0.2.

Compared with the prior art, the image restoration method based on the depth multi-feature collaborative learning has the following beneficial effects:

compared with the prior art, the method has the advantages that not only the relation between the image structure and the texture is considered, but also the relation between the image contexts is considered. The method adopts a single-stage network, and uses double branches to respectively learn the structure and the texture of the image, so that the generated structure and the texture are more consistent. And the image structure information is fully utilized, so that the generated image structure is more reasonable, and the visual image result is more real. Specifically, the consistency of the structure and the texture is enhanced through a soft gating dual-feature fusion (SDFF) module, and the blurring and the artifacts around the hole area can be effectively reduced through a switching and recombination mode. The connection from local features to overall consistency is enhanced through a Bilateral Propagation Feature Aggregation (BPFA) module, and the connection between context attention, channel information and feature space is considered, so that the repaired image has detailed textures and better image appearance.

Drawings

The present invention is further described with reference to the accompanying drawings, but the embodiments in the drawings do not limit the present invention in any way, and for those skilled in the art, other drawings may be obtained according to the following drawings without creative efforts.

FIG. 1 is a block diagram of such a multi-feature collaborative learning network provided by the present invention;

FIG. 2 is a schematic diagram of a soft-gated dual feature fusion module;

FIG. 3 is a schematic diagram of a bilateral propagation feature aggregation module;

FIG. 4 is a comparison graph of the repairing effect of the present invention on irregular holes and the image repairing technique based on deep learning in the prior art;

FIG. 5 is a comparison graph of the repairing effect of the present invention on the central hole and the existing image repairing technology based on deep learning;

fig. 6 is a graph of the results of an image repair ablation experiment of the present invention.

Detailed Description

The image restoration method based on deep multi-feature collaborative learning provided by the present invention is further described below with reference to the accompanying drawings, and it should be noted that the technical solution and the design principle of the present invention are explained in detail below only by an optimized technical solution.

The core of the image restoration method based on deep multi-feature collaborative learning provided by the invention is to provide a multi-feature collaborative learning network for restoring damaged images. First, this patent proposes a soft-gated dual feature fusion (SDFF) module that enables the coordinated information exchange of image structure and texture, thereby enabling them to strengthen the connection between each other. Second, the patent uses a Bilateral Propagation Feature Aggregation (BPFA) module to further refine the generated structure and texture by enhancing the connection from local features to global consistency through collaborative learning of context attention, channel information, and feature space. In addition, the invention uses an end-to-end single-stage network training mode, and adopts double branches to respectively learn the image structure and the texture in a single stage, so that the image artifact can be effectively reduced and a more real image result can be generated.

Specifically, the technical overall stem model of the image inpainting method based on the deep multi-feature collaborative learning is shown in fig. 1, and the method includes the following steps: (1) the encoder consists of six convolutional layers. The three shallow features are reorganized into texture features to represent image details. Meanwhile, reorganizing the three deep features into structural features to represent image semantics; (2) two branches are adopted to learn structural and textural features respectively; (3) a soft-gated dual feature fusion module to fuse the structural and texture features generated by the two branches, see fig. 2 in particular; (4) a bilateral propagation feature aggregation module equalizes features between channel information, context awareness, and feature space, see in particular fig. 3. Specifically, the dynamic kernel selection network (SKNets) is used for selecting capture channel information through an adaptive convolution kernel, capturing context relations in an image by using a Context Awareness (CA) module, and capturing relations between space and range by using a Bilateral Propagation Activation (BPA) module; (5) finally, the decoder is given guidance information by the jump-join method, synthesizing the structural and texture branches to produce more complex images; (6) the use of local and global discriminators makes the generated image more realistic.

Specifically, the image restoration method based on the depth multi-feature collaborative learning comprises the following steps:

s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network coding to form an effective image feature set;

s2, decoding and repairing the effective image feature set through a preset image decoder, and forming a repaired image through a local discriminator and a global discriminator;

L_rst＝||g(F_cst)-I_st||₁ (1-1)

L_rte＝||g(F_cte)-I_gt||₁ (1-2)

wherein, F_cstAnd F_cteOutput features, L, respectively expressed as structures and textures generated by the multi-scale fill-phase join_rstAnd L_rteExpressed as reconstruction loss of structure and texture, respectively, g (-) is a convolution operation with a kernel size of 1, let F_cstAnd F_cteRespectively mapped as color images, I_gtAnd I_stRepresenting a real image and its structural image, respectively, using an edge preserving image smoothing method to generate I_st。

G_te＝σ(SE(h([F_cst,F_cte]))) (2-1)

F′_cte＝α(β(G_te⊙F_cte)⊙F_cst)⊕F_cte (2-2)

wherein, F_cstAnd F_cteRespectively expressed as output features of structure and texture generated by concatenation of the multi-scale filling stages, h (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G_teIs used to control the degree of refinement, F ', of texture information'_cteIndicating a texture feature with structure perception, alpha and beta are learnable parameters, indicating an element to element product, and ^ indicating an element to element addition.

G_st＝σ(SE(k([F_cst,F_cte]))) (2-3)

F′_cst＝γ(G_steF_cte)⊕F_cst (2-4)

wherein, F_cstAnd F_cteRespectively, as output features of structure and texture generated by the concatenation of the multiple-scale filling stages, k (.) is a convolution operation with kernel size 3, SE (·) is a compression and activation operation to capture important channel information, σ (·) is a Sigmoid activation function, G (·)_stTo control the degree of refinement, F ', of the structural information'_cstIndicating a texture feature with structure perception, gamma is a learnable parameter, indicating an element to element product, and ^ indicating an element to element addition.

F_fu＝v([F′_cst,F′_cte]) (2-5)

Wherein, F'_cteAnd F'_cstRepresenting texture features with structure perception and texture features with structure perception, respectively, v (-) is a convolution operation with a kernel size of 1, F_fuIs the final output characteristic of the soft gating dual-characteristic fusion module.

Figure 293373DEST_PATH_FDA0003530322010000033

representing the cosine similarity between the output feature blocks,

denotes the attention score, p, obtained by the Softmax function_iAnd p_jAre respectively an input characteristic F'_fuIs an input feature F'_fuThe total number of blocks of (a) is,

a feature map reconstructed from the attention scores is shown.

wherein the content of the first and second substances,

and

The ith characteristic channel of (1)_jAre neighboring feature channels at locations j around channel i,

F (-) is a dot product operation.

wherein the content of the first and second substances,

and

Figure 382868DEST_PATH_FDA0003530322010000036

Then pass throughConcatenate convolution to F'_fuAnd

Figure 581769DEST_PATH_FDA0003530322010000036

fusion to give F_sc，

Figure 657172DEST_PATH_FDA0003530322010000046

Wherein

Figure 650536DEST_PATH_FDA0003530322010000036

For recombined multichannel characteristics, F_fuTo weigh the features obtained after channel information, F_scFor the final fusion repair feature, z is a convolution operation with a convolution kernel size of 1.

The following points describe the core technical process in detail:

(1) structural and texture branching

The texture feature of the shallow convolution reconstruction is denoted as F_teStructural features of deep convolutional recombination are denoted as F_st. In each branch, three parallel streams are used, with different scales to fill the damaged area. Where the kernel sizes of different streams are different. Finally, by combining the output feature maps of the three streams, the combined features are then mapped to the same size of the input features. Here, F_cstAnd F_cteRepresented as the outputs of the texture and texture branches, respectively. To ensure that each branch focuses on structure and texture, respectively, we use two reconstruction penalties, denoted L, respectively_rstAnd L_rte. The pixel level loss is defined as:

L_rst＝||g(F_cst)-I_st||₁ (1-1)

L_rte＝||g(F_cte)-I_gt||₁ (1-2)

where g (-) is a convolution operation with a kernel size of 1, with the aim of dividing F_cstAnd F_cteRespectively mapped as color images. I.C. A_gtAnd I_stRespectively representing a real image and its structural image. Generating I using an edge preserving smoothing method_st。

(2) Soft-gated dual feature fusion module

In this algorithm, the structural features F generated by the two branches are_cstAnd texture feature F_cteBetter combinations are made. By exchanging the two types of information, the ratio is dynamically controlled by soft gating to achieve the purpose of dynamic combination. In particular, to construct structure-guided texture features. Soft gate control G_teTo control the refinement of the texture information. This is defined as:

G_te＝σ(SE(h([F_cst,F_cte]))) (2-1)

where h (-) is a convolution operation with a kernel size of 3. SE (-) is a compression and activation operation to capture important channel information. σ (-) is a Sigmoid activation function, using soft gating G_teThis can dynamically couple F_cstIs fused into F_cte：

F′_cte＝α(β(G_te⊙F_cte)⊙F_cst)⊕F_cte (2-2)

Where α and β are learnable parameters, indicating element multiplication, and ≧ element addition.

Likewise, texture guides feature F'_cstIs defined as:

G_st＝σ(SE(k([F_cst,F_cte]))) (2-3)

F′_cst＝γ(G_steF_cte)⊕F_cst (2-4)

where k and h have the same arithmetic operation and γ is a learnable parameter.

Finally, is bonded F'_cteAnd F'_cstAnd generating feature F using convolution operation v having kernel size 1_fu：

F_fu＝v([F′_cst,F′_cte]) (2-5)

(3) Bilateral propagation feature aggregation module

This module is proposed to re-weigh the channels and space so that the image representation is more consistent. Firstly, capturing channel information by using a dynamic core selection network in an adaptive core selection mode to obtain a feature map F'_fuThe correlation between channels can be enhanced, and the consistency of the whole image can be maintained. And a Context Awareness (CA) module is introduced to capture the association between image blocks. Specifically, for a given input feature F, we extract a block of 3 × 3 pixels and calculate the cosine similarity:

wherein p is_iAnd p_jThe ith and jth blocks of the input feature F, respectively.

We use the Softmax function to obtain the attention score between each pair of blocks:

wherein N is an input feature F'_fuTotal number of blocks. Next, the feature map is reconstructed using the attention scores:

Figure 404865DEST_PATH_FDA0003530322010000033

reconstructed feature maps

Is obtained by directly recombining each block.

In the spatial and range domains, we introduce a Bilateral Propagation Activation (BPA) module to generate response values based on range and spatial distance. The response values were calculated as follows:

wherein x_iIs a characteristic of input

Figure 141877DEST_PATH_FDA0003530322010000036

Figure 665262DEST_PATH_FDA0003530322010000036

The number of locations in (f) · is a dot product operation. In the spatial domain, we explore j in the neighborhood s for global propagation. S is set to the same size as the input features in the experiment. In the range domain, v is an adjacent region of the position i, and its size is set to 3 × 3. Therefore, we can obtain the characteristic diagram by the space and range similarity measurement method respectively

And

each feature channel can compute:

where q represents the convolutional layer and the kernel size is 1.

Next, each channel is aggregated to obtain a reconstructed feature map

Figure 95106DEST_PATH_FDA0003530322010000036

Finally, we concatenate then convolve F'_fuAnd

Figure 336732DEST_PATH_FDA0003530322010000036

to obtain F_sc。

Figure 143014DEST_PATH_FDA0003530322010000046

Where z is a convolution operation with a convolution kernel size of 1.

(4) Distinguishing device

The invention introduces global and local discriminators to ensure that the local-global image content is more consistent. It consists of five convolution layers, the convolution kernel size is 4, the step length is 2, except the last layer, all other layers use the Leaky ReLu with the slope of 0.2. In addition, spectral normalization is employed to achieve stable training.

The above is only a preferred embodiment of the present invention, and it should be noted that the above preferred embodiment should not be considered as limiting the present invention, and the protection scope of the present invention should be subject to the scope defined by the claims. It will be apparent to those skilled in the art that several modifications, substitutions, improvements and embellishments of the steps can be made without departing from the spirit and scope of the invention, and these modifications, substitutions, improvements and embellishments should also be construed as the scope of the invention.

Claims

1. An image restoration method based on depth multi-feature collaborative learning comprises the following steps:

the image feature encoder is characterized by comprising six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features to represent image details, and three deep convolutional layers are used for reorganizing structural features to represent image semantics to obtain a structural feature set and a texture feature set;

2. An image inpainting method as claimed in claim 1, characterized in that the texture features and the texture features are first filled into the damaged area using three parallel streams with different kernel sizes, respectively, the three streams are combined to form an output feature map, and then the output feature map is mapped to the same size as the input features, and the output of the texture features and the texture features satisfies the following requirements:

wherein, F_cstAnd F_cteRespectively as output features of the structure and texture generated by the multi-scale fill phase join,

and

denoted as reconstruction loss of structure and texture, respectively, g (-) is a convolution operation with a kernel size of 1, F can be expressed_cstAnd F_cteRespectively mapped as color images, I_gtAnd I_stRepresenting a real image and its structural image, respectively, using an edge preserving image smoothing method to generate I_st。

3. The image inpainting method of claim 1, wherein the soft-gated dual feature fusion module comprises a structure-guided texture feature unit for performing an algorithm,

G_te＝σ(SE(h([F_cst,F_cte]))) (2-1)

wherein, F_cstAnd F_cteRespectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, h (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G_teIs used to control the degree of refinement, F ', of texture information'_cteIndicating a texture feature with structure perception, alpha and beta are learnable parameters, which indicate the corresponding product of elements,

indicating that the elements are correspondingly added.

4. The image inpainting method of claim 1, wherein the soft-gated dual feature fusion module comprises a texture-guided structural feature unit configured to perform an algorithm,

G_st＝σ(SE(k([F_cst,F_cte]))) (2-3)

wherein, F_cstAnd F_cteRespectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, k (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G_stTo control the degree of refinement, F ', of the structural information'_cstIndicating a structural feature with texture perception, gamma is a learnable parameter, which indicates the corresponding product of elements,

indicating that the elements are correspondingly added.

F_fu＝v([F′_cst,F′_cte]) (2-5)

Wherein, F'_cteAnd F'_cstRepresenting texture features with texture perception and texture features with texture perception, respectively, v (-) is a convolution operation with a kernel size of 1, F_fuIs the final output characteristic of the soft gate control dual-characteristic fusion module.

5. The image inpainting method of claim 1, wherein the bilateral propagation feature aggregation module comprises a capture channel information fusion unit for capturing channel information by an adaptive core selection method using a dynamic core selection network to obtain the feature map F'_fu。

6. The image inpainting method of claim 5, wherein the bilateral propagation feature aggregation module includes a context attention fusion unit configured to capture a relationship between input image blocks and calculate a cosine similarity, and specifically executes the following algorithm:

wherein, feature F'_fuThe division into non-overlapping blocks is performed,

representing the cosine similarity between the output feature blocks,

a feature map obtained by reconstructing the combined feature block from the attention scores is shown.

7. The image inpainting method of claim 1, wherein the bilateral propagation feature aggregation module includes a spatial information fusion unit that specifically executes the following algorithm:

wherein the content of the first and second substances,

and

The ith characteristic channel of (1)_jIs the adjacent characteristic channel, g, at a position j around channel i_αsIs a Gaussian function for adjusting the spatial contributions from neighboring feature channels, C (x) is

F (-) is a dot product operation. In the spatial domain, j is explored in the neighborhood s for global propagation. In the range domain, v is an adjacent area of the position i, and the size thereof is set to 3 × 3.

8. The image inpainting method of claim 7, wherein the output feature channel is calculated by:

wherein the content of the first and second substances,

and

the spatial and range similarity feature maps are shown, q represents the convolution layer, and the kernel size is 1.

9. An image inpainting method as claimed in claim 8, wherein each channel feature is aggregated to obtain a reconstructed feature map

F 'is then formed by concatenated convolution'_fuAnd

fusion to give F_sc，

Wherein

10. The image inpainting method of claim 1, wherein the global and local discriminators are composed of five convolutional layers, the convolutional kernel size is 4, the step size is 2, all layers except the last layer use a Leaky ReLu with a slope of 0.2, and stable training is achieved using spectral normalization.