CN116309155A

CN116309155A - Image restoration method, model and device based on convolution and converter hybrid network

Info

Publication number: CN116309155A
Application number: CN202310256040.1A
Authority: CN
Inventors: 钟涵文; 蓝善祯; 张岳; 李绍彬; 章勇勤; 李若彤; 杨慧祎; 尉婉婷
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2023-03-08
Filing date: 2023-03-08
Publication date: 2023-06-23

Abstract

The invention discloses an image restoration method, a model and a device based on a convolution and converter mixed network, wherein the model construction method comprises the following steps: step 1: collecting image data of the burial wall painting, preprocessing and manually marking to obtain a data set of a burial wall painting detection model; determining a training set and a verification set of the repair model; step 2: inputting the manually marked data set into a detection model for training; step 3: detecting and converting the broken tomb wall painting into a binary mask image to obtain a repair test set image; step 4: designing a damage repair model; step 5: training by using a repair data set to obtain a tomb wall painting repair model; step 6: and testing the repair model by using the test set to realize automatic repair of the damaged area. According to the invention, the detection effect of the tomb wall painting detection model is more accurate, and in addition, the texture information of the image is globally restored by utilizing a transducer model through a hybrid restoration network, and the restoration image with higher quality is restored by combining a fast Fourier transform method.

Description

Image restoration method, model and device based on convolution and converter hybrid network

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an image restoration method, model and device based on a mixed network of depth convolution and a visual transducer.

Background

The tomb wall painting is taken as a precious cultural heritage, and the protection problem is always concerned by the archaeological school and cultural relic protection in China and abroad. The degradation phenomena of fading, distortion, falling and the like of the tomb wall painting are gradually aggravated due to the influences of natural disasters, human factors, climate and the like, and are difficult to reverse, so that the originally scarce nonrenewable cultural resources are rapidly and sharply reduced. In recent years, through the common efforts of expert students, great progress has been made in the protection research of wall paintings, but the research is mostly focused on repairing local wall paintings due to the limitations of research modes and technical conditions. And the repairing results are mostly expressed in the forms of papers, archaeological reports, restoration design drawings, books and the like. The long-term protection of the tomb wall painting and the exploration of building reconstruction are seldom and mostly are not enough, and the quantity and the quality of the image information of the tomb wall painting can directly influence the development of subsequent research work. It is undeniable that the conventional methods such as evidence collection, manual tracing and field investigation do have irreplaceable effects in early researches, but the problems of low image precision, low material utilization rate, serious data information loss, difficult reconstruction of the three-dimensional space of the mural sculpture and the like caused by the conventional methods cannot be solved. For broken parts in the wall painting, the current cultural relics protection personnel usually use manual repair, but the manual repair has severe technical requirements for repairmen, so that the repairmen are required to have years of working experience, and have higher expertise and artistic repair, and the wall painting can be irreversibly lost if the wall painting is slightly overlooked in the repair process.

With the advancement of digital acquisition means in modern society, a large number of pictorial collections are collected and digitally displayed. This provides support for digital image analysis using deep learning algorithms. The invention uses the algorithm of digital image analysis to extract the high-dimensional characteristics of the burial wall painting, then learns characteristic information in a large amount of training sets, detects the damaged area of the burial wall painting by using a detection network, and then carries out digital restoration on the obtained damaged area by using a Transformer-based restoration network. The digital image processing technology can detect damaged areas in the image through some mathematical prior models or algorithms, and the non-contact identification restoration is carried out on the image by using the restoration technology of the digital image, so that secondary damage to image entities in the traditional manual method is avoided, and restoration results are also convenient to keep and cannot be damaged. Thus, constructing a digitally based dataset, using convolutional neural networks and convectors to identify damaged areas to repair a burial wall is a primary task of the present invention.

Disclosure of Invention

Aiming at the defects and shortcomings in the prior art, the invention provides an image restoration method, model and device based on a convolution and converter hybrid network, which realize high-quality automatic restoration of a tomb wall painting damaged area by establishing a detection model and a restoration model, provide technical support for application and popularization of deep learning in cultural heritage digital protection and solve the problem that the tomb wall painting damaged area cannot be automatically detected and restored in the prior art. According to the invention, the damaged area of the burial wall painting is accurately detected by designing the corresponding detection network model, and the hollow area with complete and consistent texture is automatically recovered by the designed repair model.

In order to achieve the above purpose, the invention adopts the following technical scheme:

in one aspect, an image restoration model construction method based on a convolution and converter hybrid network is provided, which includes the following steps:

step 1: collecting image data of the tomb wall painting by using digital equipment; the collected tomb wall painting image comprises a complete tomb wall painting and a tomb wall painting image with damage, the collected tomb wall painting image data are preprocessed, and the tomb wall painting with the damage area is marked manually by marking software to obtain a tomb wall painting detection model data set; taking the perfect tomb wall painting image as a training set and a verification set of a tomb wall painting repair model;

step 2: inputting the manually marked tomb wall painting data set in the step 1 into a tomb wall painting damage area detection model for training to obtain a trained detection model;

step 3: performing damage detection on the tomb wall painting with damage by using the trained detection model obtained in the step 2, and converting the detection result into a binary mask image to obtain a repair test set image of the tomb wall painting;

step 4: designing a visual feature converter-based damage repair model, wherein the visual feature converter-based damage repair model consists of a Transformer-based structural texture repair model and a content filling model; the structural texture repair model adopts a fast Fourier convolution layer to learn a frequency domain and combines a structural feature encoder formed by a full convolution network;

step 5: training a damage repair model based on a visual characteristic converter by utilizing the grave mural repair data set obtained in the step 1 to obtain a grave mural repair model;

step 6: and (3) testing the trained hybrid network detection and repair model based on convolution and a converter by using the tomb wall painting test set obtained in the step (3) so as to realize automatic repair of the damaged region of the tomb wall painting.

Further, in the step 2, the tomb wall painting damage area detection model is used for realizing the following procedures:

step 21, firstly, 5 feature vectors with different scales are obtained from an input image through a multi-scale feature extraction network, and three feature vectors with minimum scales are sent into a feature pyramid for sampling and splicing; obtaining 5 feature graphs with different sizes after sampling and splicing; transversely connecting a multi-scale feature extraction network with a feature pyramid, and fusing low-level features and deep features by utilizing a bottom-up connection mode of the feature pyramid to obtain features with different sizes;

step 22, in the prediction stage, the prediction branch adopts an improved anchor-based target detector, which comprises a classification layer, a regression layer and a mask coefficient layer; the classification layer is used for outputting confidence scores of target categories, the regression layer is used for outputting prediction frame results, the mask coefficient layer is used for predicting K generated mask coefficients, and each coefficient corresponds to each generated prototype; and (3) taking the feature layer result of the feature pyramid network obtained in the step (21) as the input of a prediction branch, and obtaining three groups of data in a way of sharing a convolution layer: regression coordinates and mask coefficients of the category to which the anchor point belongs and the prediction frame.

Further, in the step 2, the loss function is composed of 3 parts:

Loss＝Loss _cls +αLoss _box +βLoss _mask

wherein, loss _cls 、Loss _box 、Loss _mask Respectively representing a classification loss function, a prediction frame loss function and a mask loss function, wherein alpha and beta are weight coefficients; the classification loss function is cross entropy loss, the prediction box loss function adopts smoothL1 loss function, and the mask loss function is pixel-by-pixel binary cross entropy between the prediction mask and the truth mask:

Loss _mask ＝BCE(M，M _gt )

wherein BCE refers to a binary cross entropy loss function, M is a prediction mask, M _gt Is the true value mask.

Further, in the step 4, the loss function is composed of the following four parts:

loss＝αL _l1 +βL _adv +γL _fm +μL _hrf formula III

Wherein α, β, γ, μ are the weights of the respective loss functions, where L _fm To match the loss of features, L _adv Consists of discrimination loss and generation loss, L _hrf Is a visual loss; where α=10, β=10, γ=100, μ=30.

In another aspect, the present invention provides an image restoration model construction apparatus based on a convolution and converter hybrid network, including:

the data acquisition module is used for acquiring a tomb wall painting data set;

the tomb wall painting damage detection model module is used for training the preprocessed tomb wall painting data set to obtain a tomb wall painting damage area detection model, and the model is used for detecting the damage area of the input tomb wall painting image; automatically outputting the detection result as a binary image, wherein the output detection result and the corresponding binary image are used for repairing a tomb wall painting damaged area repairing model;

the tomb wall painting damage repair model module is used for training the preprocessed undamaged tomb wall painting data set to obtain a tomb wall painting damage area repair model; the model is used for repairing the input damaged tomb wall painting image and the corresponding binary image thereof to obtain a repaired tomb wall painting image.

In a third aspect, the invention provides an image restoration method based on a mixed network of depth convolution and a visual transducer, which comprises the steps of inputting a tomb wall painting image to be restored into a trained tomb wall painting detection model to obtain a detected damaged area image and a corresponding binary image thereof, and inputting the damaged area image and the binary image into the trained tomb wall painting restoration model to obtain a restored tomb wall painting image; the trained tomb wall painting detection model and the tomb wall painting repair model are obtained through the image repair model construction method based on the convolution and converter mixed network.

In a fourth aspect, the present invention provides a computer device comprising a memory storing a computer program and a processor implementing the convolution and converter hybrid network based image restoration model construction method of the present invention when the computer program is executed.

In a fifth aspect, the present invention provides a computer readable storage medium for storing program instructions executable by a processor to implement the convolution and converter hybrid network-based image restoration model construction method of the present invention.

Compared with the prior art, the invention has the following beneficial effects:

according to the invention, the design of the multi-scale feature extraction network and the feature pyramid is combined, so that the detection effect of the tomb wall painting detection model is more accurate compared with that of the conventional method, in addition, the texture information of the image is globally restored by using a transducer model through the hybrid restoration network based on the depth convolution and the visual transducer, and the restoration image with higher quality is restored by combining a fast Fourier transform method.

Drawings

FIG. 1 is a flow chart of a method for detecting and repairing damaged areas of a grave wall painting according to the invention.

Fig. 2 is a schematic structural view of a tomb wall painting damage area detection model in the present invention.

Fig. 3 is a schematic structural view of a repair model for a broken region of a burial wall painting in the present invention.

In fig. 4, (a) is an image to be repaired, (b) is a binary image of the detected damaged area, and (c) is a post-repair image.

In fig. 5, (a) is an image to be repaired, (b) is a binary image of the detected damaged area, and (c) is a post-repair image.

Detailed Description

Embodiment one:

as shown in fig. 1 to 3, the image restoration model construction method based on the convolution and converter hybrid network provided in the present embodiment includes the following steps:

step 1: collecting image data of the tomb wall painting by using digital equipment; the collected tomb wall painting image comprises a complete tomb wall painting and a tomb wall painting image with damage, the collected tomb wall painting image data are preprocessed, and the tomb wall painting with the damage area is manually marked by marking software to obtain a tomb wall painting detection model data set. And taking the perfect tomb wall painting image as a training set and a verification set of the tomb wall painting repair model.

Step 2: and (3) inputting the manually marked tomb wall painting data set in the step (1) into a tomb wall painting damage area detection model for training to obtain a trained detection model.

The tomb wall painting damage area detection model is used for realizing the following procedures:

step 21, firstly, 5 feature vectors with different scales are obtained from an input image through a multi-scale feature extraction network, and three feature vectors with minimum scales are sent into a feature pyramid for sampling and splicing; and 5 feature maps with different sizes are obtained after sampling and splicing. The method comprises the steps of transversely connecting a multi-scale feature extraction network with a feature pyramid, and fusing low-level features with deep features by utilizing a bottom-up connection mode of the feature pyramid to obtain features with different sizes.

In step 22, during the Prediction phase, an anchor-based object detector (Prediction Head) generally includes a classification layer and a regression layer. The prediction branch in the present invention employs an improved anchor-based object detector that includes not only a classification layer and a regression layer, but also a masking coefficient layer. The classification layer is used for outputting confidence scores of target categories, the regression layer is used for outputting prediction frame results, and the mask coefficient layer (Protone) is used for predicting K generated mask coefficients, wherein each coefficient corresponds to each generated prototype.

And (3) taking the feature layer result of the feature pyramid network obtained in the step (21) as the input of a prediction branch, and obtaining three groups of data in a way of sharing a convolution layer: regression coordinates and mask coefficients of the category to which the anchor point belongs and the prediction frame.

The loss function in step 2 consists of 3 parts:

Loss＝Loss _cls +αLoss _box +βLoss _mask i is a kind of

Wherein, loss _cls 、Loss _box 、Loss _mask Respectively representing a classification loss function, a prediction frame loss function and a mask loss functionThe numbers, α, β are weight coefficients. The classification loss function is defined as cross entropy loss, the prediction box loss function employs smoothL1 loss function, and the mask loss function is defined as a prediction mask and a truth mask (M _gt ) Pixel-by-pixel binary cross entropy:

Loss _mask ＝BCE(M，M _gt ) II (II)

Step 3: and (3) carrying out damage detection on the tomb wall painting with damage by using the trained detection model obtained in the step (2), and converting the detection result into a binary mask image to obtain a repair test set image of the tomb wall painting.

Step 4: aiming at the characteristics of the damaged areas of the tomb wall painting, a damage repair model based on a visual characteristic converter is designed. The damage repair model based on the visual characteristic converter consists of a structural texture repair model based on a transducer and a content filling model. The model utilizes the global information extraction advantage of a transducer to recover the texture structure of a broken region of the grave wall painting, and fills the missing content information of the grave wall painting by utilizing the deep convolutional neural network after the texture information is filled. In the field of image restoration, the transducer can learn the interaction relation between longer-range sequence data relative to a convolutional neural network, so that the transducer is used for overall texture structure restoration. Furthermore, to overcome the problem of high temporal complexity of the standard self-attention mechanism of the transducer, we alternate between axial and standard attention in the module.

The structural texture restoration model adopts a fast Fourier convolution layer to learn a frequency domain, and combines a structural feature encoder formed by a full convolution network to improve the structural restoration effect of the tomb wall painting.

Our loss function is mainly composed of the following four parts:

loss＝αL _l1 +βL _adv +γL _fm +μL _hrf formula III

Where alpha, beta, gamma, mu are the weights of the individual loss functions,wherein L is _fm To match the loss of features, L _adv Consists of discrimination loss and generation loss, L _hrf Is a visual loss. Where α=10, β=10, γ=100, μ=30.

Step 5: training the damage repair model based on the visual characteristic converter by utilizing the grave mural repair data set obtained in the step 1 to obtain the grave mural repair model.

Embodiment two:

the embodiment provides an image restoration model construction device based on a convolution and converter mixed network, which comprises the following components:

the data acquisition module is used for acquiring a grave wall painting data set, wherein the data set is used for training and testing a grave wall painting damage detection model and a grave wall painting damage repair model;

the tomb wall painting damage detection model module is used for training the preprocessed tomb wall painting data set to obtain a tomb wall painting damage area detection model, and the tomb wall painting damage area detection model is used for detecting an input tomb wall painting image. Automatically outputting the detection result as a binary image, wherein the output detection result and the corresponding binary image are used for repairing a tomb wall painting damaged area repairing model;

and the tomb wall painting damage repair model module is used for training the preprocessed undamaged tomb wall painting data set to obtain a tomb wall painting damage area repair model. The model is used for repairing the input damaged tomb wall painting image and the corresponding binary image thereof to obtain a repaired tomb wall painting image.

Embodiment III:

the embodiment provides an image restoration method based on a convolution and converter mixed network, which is characterized in that a tomb wall painting image to be restored is input into a trained tomb wall painting detection model to obtain a detected damaged area image and a corresponding binary image thereof, and then the damaged area image and the binary image are input into the trained tomb wall painting restoration model to obtain a restored tomb wall painting image; the trained tomb wall painting detection model and the tomb wall painting repair model are obtained through the image repair model construction method based on the convolution and converter mixed network.

Embodiment four:

the embodiment provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and is characterized in that the processor executes the computer program to realize the image restoration model construction method based on the convolution and converter mixed network.

Fifth embodiment:

the present embodiment provides a computer-readable storage medium for storing program instructions executable by a processor to implement the convolution and converter hybrid network-based image restoration model construction method of the present invention.

The above-described computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, in another alternative embodiment, the computer program product is embodied as a software product, or the like.

And (3) experimental verification:

in this embodiment, the images to be repaired shown in fig. 4 (a) and 5 (a) are input into a tomb wall painting damage detection model, the damage area binary images detected in fig. 4 (b) and 5 (b) are obtained, the detection result of fig. 4 (a) is fig. 4 (b), the detection result of fig. 5 (a) is fig. 5 (b), the damage area binary images shown in fig. 4 (b) and 5 (b) are input into a tomb wall painting damage repair model, and the tomb wall painting images after repair of fig. 4 (c) and 5 (c) are obtained, the repair result of fig. 4 (b) is fig. 4 (c), and the repair result of fig. 5 (b) is fig. 5 (c).

In the experiment, a grave mural painting data set is used, the detection method provided by the invention is compared with the existing two-stage segmentation network Mask R-CNN, and the average precision AP, AP50, AP75 and Aps, APm, APl of a marked frame bbox are compared, wherein the six are main indexes for evaluating a coco data set, the comparison of the indexes for measuring the detection effect is shown in a table 1, and the larger the AP is, the higher the accuracy is represented; comparing the restoration method provided by the invention with the LAMA method in the prior art, comparing two indexes of peak signal-to-noise ratio PSNR, diversity of generated images and quality FID (field intensity distribution) for measuring restoration effect with each other is shown in table 2, wherein the larger the PSNR is, the better the image quality is represented, the smaller the FID is, the closer the two distributions are, and the higher the quality and the better the diversity of generated images are.

TABLE 1 comparison of the detection methods of the present invention with the existing detection methods

Method	AP↑	AP50↑	AP75↑	Aps↑	APm↑	APl↑
							Mask R-CNN	13.25	28.05	10.89	7.73	10.89	14.03
The method	19.63	35.58	17.58	12.29	29.10	40.20

TABLE 2 comparison of the repair methods of the present invention with existing repair methods

Method	FID↓	PSNR↑
			LAMA	33.11	30.34
The method	27.50	37.95

Experimental results show that the performance of the model built by the method is superior to that of the method in the prior art, and the effectiveness and superiority of the method provided by the invention are fully verified.

Claims

1. The image restoration model construction method based on the convolution and converter mixed network is characterized by comprising the following steps of:

2. The method for constructing an image restoration model based on a mixed network of convolution and converter as claimed in claim 1, wherein in the step 2, the tomb wall painting damage area detection model is used for realizing the following procedures:

3. The method for constructing an image restoration model based on a mixed network of convolution and conversion as claimed in claim 1, wherein in said step 2, the loss function is composed of 3 parts:

Loss＝Loss _cls +αLoss _box +βLoss _mask

Loss _mask ＝BCE(M,M _gt )

4. The method for constructing an image restoration model based on a mixed network of convolution and conversion as claimed in claim 1, wherein in said step 4, the loss function is composed of four parts:

loss＝αL _l1 +βL _adv +γL _fm +μL _hrf III

5. An image restoration model construction device based on a convolution and converter mixed network, which is characterized by comprising:

6. The image restoration method based on the mixed network of the depth convolution and the visual converter is characterized in that a grave mural image to be restored is input into a trained grave mural detection model to obtain a detected damaged area image and a corresponding binary image thereof, and then the damaged area image and the binary image are input into the trained grave mural restoration model to obtain a restored grave mural image; the trained tomb wall painting detection model and the tomb wall painting repair model are obtained by the image repair model construction method based on the convolution and converter mixed network according to any one of claims 1-4.

7. A computer device comprising a memory storing a computer program and a processor implementing the method of constructing an image restoration model based on a hybrid network of convolution and converters according to any one of claims 1 to 4 when the computer program is executed.

8. A computer readable storage medium storing program instructions executable by a processor to implement the method of constructing an image restoration model based on a hybrid network of convolution and converters of any one of claims 1 to 4.