CN112200752A

CN112200752A - Multi-frame image deblurring system and method based on ER network

Info

Publication number: CN112200752A
Application number: CN202011170383.9A
Authority: CN
Inventors: 谢春芝; 高志升
Original assignee: Chengdu Shizheng Technology Co ltd; Xihua University
Current assignee: Anhui Farmer Agricultural Technology Co.,Ltd.
Priority date: 2020-10-28
Filing date: 2020-10-28
Publication date: 2021-01-08
Anticipated expiration: 2040-10-28
Also published as: CN112200752B

Abstract

The invention relates to the field of computer vision, in particular to an ER network-based multi-frame image deblurring system, which comprises an instruction receiving module, an instruction generating module, an acquisition module and a sharpening processing module; a method for deblurring multi-frame images based on an ER network comprises the steps of constructing a degradation model, establishing a simulation data set, generating a first degradation image and a first degradation image sequence for training, sequencing the first degradation image sequence by using an ERnet neural network model, evaluating the sequencing result of the ERnet neural network model, adding a multi-frame complementary information extraction network and a space-time attention mechanism output result to serve as input data of an information refinement network, and inputting a fuzzy target image by a user to generate a clear target image; the problems that an atmospheric turbulence degradation image has randomness, multiple frames have obvious complementary information, and meanwhile, the multiple frames have random difference are solved.

Description

Multi-frame image deblurring system and method based on ER network

Technical Field

The invention relates to the field of computer vision, in particular to a multi-frame image deblurring system and a method thereof based on an ER network.

Background

Restoration of a degraded image is a key task in the field of computer vision, and is a current popular research problem, and can be divided into non-blind restoration and blind restoration from the perspective of a restoration method, the non-blind restoration is to restore a known degraded point diffusion function of the image, deconvolution is performed by combining a blurred image and a regularization rule, a complete blind restoration algorithm is to restore the image under the condition that a fuzzy parameter and a noise parameter are unknown, only the degraded image is input, the non-blind restoration only needs to predict the point diffusion function or the point diffusion function to accord with specific prior distribution, so the method is difficult to be applied to restoration of a real image, and can be divided into motion blur, jitter blur, defocus blur, physical blur, noise interference and atmospheric turbulence blur from the perspective of the degradation factor, as mentioned above, the image blurred by the atmospheric turbulence is often influenced by various degradation factors and is also seriously degraded, the recovery difficulty is the greatest, and from the input data point of view, the recovery method can be divided into single-frame recovery and multi-frame recovery. Multi-frame restoration is to input consecutive multi-frame images to restore the images, and may be used for video restoration.

The clear image is solved by introducing regularization prior, such as Tikhonov and the like, and then an optimal equation is solved by an optimization algorithm to obtain the clear image, the method has the problems that a model is excessively simplified and regularization rules are difficult to determine when the recovery problem of a severely degraded image is processed, so that the algorithm has poor stability and even completely fails, in recent years, the work of image recovery by utilizing the strong representation learning capability of a deep Convolutional Neural Network (CNN) is increasingly increased and remarkable effects are obtained, the image recovery is realized by the learning-based method, a large amount of training data is needed, the model generalization capability and the interpretability are poor, the degradation image recovery by combining the advantages of the two is also an idea for solving the problem, and the method is the prior of the image learning by utilizing the deep neural network, and then the image is solved by a traditional method, and the other method is to enhance the detail information of the image by using a deep neural network after the image is restored by the traditional method.

Disclosure of Invention

Based on the problems, the invention provides a multi-frame image deblurring system based on an ER network and a method thereof, which solve the problems that an atmospheric turbulence degraded image has randomness, multiple frames have obvious complementary information, and the multiple frames also have random difference.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a multi-frame image deblurring system based on ER network comprises

The receiving instruction module is used for receiving a task instruction created by a user;

the instruction generation module is used for creating a task instruction by a user;

the acquisition module is used for receiving a fuzzy target image uploaded by a user;

and the sharpening processing module is used for sharpening the fuzzy target image uploaded by the user.

Further, the sharpening processing module comprises

A degradation model for generating a first degraded image and a first degraded image sequence;

the ERnet neural network model comprises a multi-frame complementary information extraction network, an information refinement network and a space-time attention mechanism, and is used for sequencing the first degraded images and generating a new second degraded sequence;

a degeneration characteristic extractor, comprising a first 2D convolution layer, a first maximum pooling layer and a first flatten layer, for outputting degeneration characteristic vectors;

a clear feature extractor comprising a second 2D convolutional layer, a second maximum pooling layer and a second flatten layer, for outputting a clear feature vector;

the degraded image generator is used for outputting a second degraded image after inputting the degraded characteristic vector;

a sharp image generator for outputting a sharp image corresponding to the second degraded image after inputting the sharp feature vector;

the multi-frame complementary information extraction network comprises a third 3D convolution layer, a fourth 3D convolution layer, a 3D transposition convolution layer, a third maximum pooling layer and a first residual decomposition block, and is used for extracting effective information of each frame of an input degraded image sequence and fusing the effective information into a single-channel feature map for output;

the space-time attention mechanism comprises a 3D global tie pooling layer, a 3D global maximum pooling layer, a plurality of full-connected layers, a second residual decomposition block, a fifth 3D convolutional layer, a 2D global tie pooling layer, a 2D global maximum pooling layer and a third 2D convolutional layer, and is used for reinforcing effective information of an input degraded image sequence and outputting a single-channel characteristic diagram;

the information refinement network comprises an encoder, a decoder, a third 2D convolutional layer and a fourth 2D convolutional layer, wherein the encoder comprises 4 third residual decomposition blocks and 4 corresponding down-sampling layers, and the decoder comprises 4 up-sampling layers and 4 corresponding fourth residual deblocking layers and is used for correcting the output of the multi-frame complementary information extraction network and a space-time attention mechanism.

A multi-frame image deblurring method based on an ER network adopts the multi-frame image deblurring system based on the ER network, and comprises

Step one, constructing a degradation model, comprising the following steps:

s11, constructing a point spread function PSF of the atmospheric turbulence characteristic, wherein the formula is as follows:

PSF＝exp{-3.44(αfU/r)^5/3}；

wherein U represents frequency, (U, v) represents unit pulse, α represents wavelength, f represents focal length of optical system, and r represents friedel parameter;

s12, the degradation model is expressed as:

y＝T(x*PSF+N)；

wherein N represents poisson noise, which represents a convolution operation, + represents a summation operation, the poisson noise N being affected by a distribution parameter λ;

step two, forming a simulation data set by the selected plurality of clear images of the space target;

inputting the simulation data set in the step two into a degradation model to generate a first degradation image and a first degradation image sequence for training;

inputting the first degraded image into an ERnet neural network model to sequence the first degraded image sequence, generating a second image sequence, and evaluating a sequencing result of the ERnet neural network model;

inputting a second degraded image sequence into a multi-frame complementary information extraction network and a space-time attention mechanism, and adding output results of the multi-frame complementary information extraction network and the space-time attention mechanism to obtain input data of an information refinement network;

and step six, inputting the fuzzy target image by the user to generate a clear target image.

Further, in the third step, the method for generating the first degraded image specifically includes

S31, randomly generating a plurality of groups of degradation interferences within a preset fixed degradation range, wherein the preset degradation range is (r)_x,r_y)、(λ_x,λ_y) R is at (r)_x,r_y) Is selected from (lambda) and_x,λ_y) Selecting;

s32, presetting K frames in the input degraded image sequence, namely randomly generating K PSFs and noises in a degradation range;

and S33, convolving the generated K PSFs with the spatial target sharp images respectively to obtain corresponding K frame spatial target blurred images, adding the frame spatial target blurred images and corresponding noise to generate a first degraded image, and outputting a first degraded image sequence.

Further, in the fourth step, the first degraded image of the single frame is input into the ERnet neural network model, the first degraded image passes through the first 2D convolution layer, the first maximum pooling layer and the first scatter layer in sequence, the degraded feature vector is output, the degraded feature vector is input into the degraded image generator, and the second degraded image is output; and the first degraded image passes through the second 2D convolutional layer, the second maximum pooling layer and the second flatten layer in sequence, clear feature vectors are output, the clear feature vectors are input into a clear image generator, clear images corresponding to the second degraded image are output, and finally a second image sequence is generated.

Further, in the fourth step, K groups of first degraded image sequences are input to the ERnet neural network model frame by frame to obtain a degraded feature vector group dvec and a clear feature vector group svec, a total of l frames of each group of input degraded image sequences are preset, and the l frames are used

An ith degeneration feature vector representing the kth degeneration feature vector set, using

A jth distinct feature vector representing a kth distinct feature vector group, wherein 0 < K, such that the tie value of each group of distinct feature vectors is used as a definition index, the formula being:

wherein svec_kRepresenting the mean of the kth set of distinct feature vectors, and summing each of the set of degenerate feature vectors with svec_kEuclidean distance calculation

The formula is as follows:

wherein L is₂Representing the Euclidean distance function, and denoting the order of the input first degraded image sequence as rank_inpOrdering of ERnet neural network modelsThe result is denoted rank_outpMeasuring the difference between the sequence of the input first degraded image sequence and the sequencing result of the ERnet neural network model by using the weighted Kendel distance, taking the average value of the weighted Kendel distances as the evaluation result of the ERnet neural network model, and recording the evaluation result as M_RNThe formula is as follows:

where τ is a weighted Kendel distance function.

Furthermore, the multi-frame complementary information extraction network is formed by combining a third 3D convolutional layer with a convolutional kernel of f × 1 × 1 and a fourth 3D convolutional layer with a convolutional kernel of 1 × k × k in a residual error mode, wherein the third 3D convolutional layer with the convolutional kernel of f × 1 × 1 performs time domain feature learning, and the fourth 3D convolutional layer with the convolutional kernel of 1 × k × k performs spatial domain feature learning.

And further, a second degraded image sequence is used as input and is input into a multi-frame complementary information extraction network, when in coding, the image characteristics are subjected to down sampling once by a third maximum pooling layer every time the image characteristics pass through a first residual decomposition block, when in decoding, the image characteristics are subjected to up sampling by a 3D transposition convolution layer and then are input into the first residual decomposition block, the third 3D convolution layer performs pixel level correction on the input second degraded image sequence, and the fourth 3D convolution layer fuses the characteristics of the last first residual decomposition block and outputs a 2D single-channel characteristic diagram.

Further, the process of the spatiotemporal attention mechanism includes

S51, inputting the second degraded image sequence as an original feature into a 3D global tie pooling and a 3D global maximum pooling to obtain a first feature vector and a second feature vector, respectively transmitting the first feature vector and the second feature vector into different full-connection layers, adding outputs of the two different full-connection layers and multiplying the outputs of the two different full-connection layers by the original features of multiple frames to obtain the attention features of the multiple frames;

s52, inputting the second degraded image sequence as an original feature into a 2D global tie pooling and a 2D global maximum pooling to obtain a first one-dimensional vector and a second one-dimensional vector, respectively transmitting the first one-dimensional vector and the second one-dimensional vector into different full-connected layers, adding outputs of the two different full-connected layers and multiplying the outputs of the two different full-connected layers by the multi-frame attention feature to obtain a channel attention feature;

s53, carrying out averaging and maximum value operation on the channel attention characteristics along the spatial dimension to obtain two single-channel characteristic graphs which are respectively a spatial average value graph and a spatial maximum value graph;

and S54, connecting the space mean value graph and the space maximum value graph, and performing 2D convolution and multiplying the channel attention characteristics to obtain the space attention graph.

Further, the output results of the multi-frame complementary information extraction network and the space-time attention mechanism are added to be used as input data of an information refinement network, when the information refinement network is used for coding, the image characteristics enter a corresponding down-sampling layer every time the image characteristics pass through a third residual decomposition block, when the information refinement network is used for decoding, the image characteristics enter a corresponding fourth residual deblocking every time the image characteristics pass through an up-sampling layer, convolution kernels of a third 2D convolution layer and a fourth 2D convolution layer are (1,1), and the convolution kernels are used for correcting the input image characteristics.

Compared with the prior art, the invention has the beneficial effects that:

1. according to a perfect mirror image authority management system, a user obtains authorization through an authorization request, and the user after successful verification can perform subsequent operation, so that the sharing safety of the mirror image is ensured, and the cooperation of a team is convenient;

2. after the user obtains the authorization, the user only needs to push the new mirror image to the mirror image registry or the mirror image warehouse, and the running version of the containerized application can be updated.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of the present invention for generating a first degraded image sequence;

FIG. 3 is a block diagram of a residual error concatenation scheme according to the present invention;

FIG. 4 is a graph of the restoration effect of the ERnet neural network model of the present invention on 32 × 32 patch;

FIG. 5 is a graph of the effect of image restoration compared to 5 baseline methods according to the present invention;

FIG. 6 is a graph showing the comparison of the present invention with the motion blur restoration effects of DMPHN, Aittalaetal, and SimHetal;

FIG. 7 is a graph showing the comparison of the motion blur restoration effect of the present invention with DMPHN, Aittalaetal, and SimHetal.

Detailed Description

The invention will be further described with reference to the accompanying drawings. Embodiments of the present invention include, but are not limited to, the following examples.

In this embodiment, a system for deblurring multi-frame images based on an ER network includes

Further, the sharpening processing module comprises

Step one, constructing a degradation model;

the atmospheric turbulence always carries out irregular movement, physical quantities in the movement are random variables of time and space, and need to be described by using a statistical law, therefore, a concept of an atmospheric turbulence structure function is introduced, relevant work about the atmospheric turbulence comprises research of optical astronomical imaging under the influence of the atmospheric turbulence, simulation experiments for turbulence degradation by simulating a Kolomogorov phase screen and the like, and based on the research and the experiments, the invention adopts the following point spread function PSF with atmospheric turbulence characteristics, and the formula is as follows:

PSF＝exp{-3.44(αfU/r)^5/3}；

where U denotes frequency, (U, v) denotes unit pulse, α denotes wavelength, f denotes focal length of the optical system, r denotes friedel parameter, and the degradation model is expressed as:

y＝T(x*PSF+N)；

wherein, N represents Poisson noise, x represents convolution operation, and + represents summation operation, and Poisson noise N is influenced by distribution parameter lambda, and lambda can be artificially preset according to actual conditions.

the clear images of the space targets come from an STK satellite toolkit, the orbit and the attitude of the satellite can be calculated, the position and the actual illumination of each space target are calculated according to real information, texture information and space models of the surfaces of a plurality of space targets are provided, and therefore space visual scenes can be simulated.

in order to better simulate the random change of the clear images of the space target at different moments, a plurality of groups of degradation interferences need to be randomly generated in a fixed degradation range, therefore, a degradation range needs to be preset first, as can be known from the step one, a point spread function PSF and poisson noise N which cause image degradation are respectively determined by r and lambda, when a value boundary of r and lambda is explored, a quality evaluation method of two non-reference images, namely Variance and Brenner, is utilized to continuously compare the generated degradation image with the score of the clear image of the real space target to adjust the degradation boundary, and finally, two degradation ranges (r and r are preset in the embodiment (r and r are r and b respectively determined by r_x,r_y)、(λ_x,λ_y) In summary, as shown in fig. 2, the method for generating the first degraded image sequence includes:

s31, randomly generating a plurality of groups of degradation interferences within a preset fixed degradation range, wherein r is in (r)_x,r_y) The method comprises the following steps of (1) selecting,λ is at (λ)_x,λ_y) Selecting;

when a first degraded image of a single frame is input into an ERnet neural network model, the first degraded image passes through a degraded feature extractor, wherein after each first 2D convolution layer of the first degraded image, the image features pass through a first maximum pooling layer for down-sampling, and after the last first maximum pooling layer, the image features are subjected to a flat ten operation to output degraded feature vectors, the degraded feature vectors are input into a degraded image generator, and a second degraded image with the same size as the first degraded image is output; similarly, the first degraded image passes through the clear feature extractor, wherein after each second 2D convolution layer of the first degraded image, the image features are downsampled through the second maximum pooling layer, and after the last second maximum pooling layer, the flatten operation is performed on the image features to output a clear feature vector, the clear feature vector is input into the clear image generator, and a clear image with the same size as the first degraded image is output.

In addition, after the K groups of ordered first degraded image sequences are input to the ERnet neural network model frame by frame, the ranking result of the ERnet neural network model also needs to be evaluated, and the specific method is as follows:

inputting K groups of first degraded image sequences into an ERnet neural network model frame by frame to obtain a degraded characteristic vector group dvec and a clear characteristic vector group svec, presetting a total frame of each group of input degraded image sequences, and using

The formula is as follows:

wherein L is₂A function of the euclidean distance is represented,

the smaller the image sequence is, the smaller the degradation of the ith frame in the kth group is, and otherwise, the larger the degradation of the ith frame in the kth group is, and the sequence of the input first degraded image sequence is recorded as rank_inpRecording the ranking result of the ERnet neural network model as rank_outpInput measured using weighted Kendel distancesTaking the average value of the weighted Kendel distances as the evaluation result of the ERnet neural network model, and recording the evaluation result as M_RNThe formula is as follows:

where τ is a weighted Kendel distance function, M_RNThe larger the ranking result of the ERnet neural network model is, the closer the ranking result of the ERnet neural network model is to the input first degraded image sequence is, the better the ranking effect of the ERnet neural network model is, otherwise, M_RNThe smaller the ranking, the less effective the ERnet neural network model.

wherein, in the multi-frame complementary information extraction network, an f multiplied by k 3D convolution kernel can generate f.k²In order to preserve the characteristics of the 3D convolution kernel and reduce the network parameters, as shown in fig. 3, the multi-frame complementary information extraction network is formed by combining a third 3D convolution layer with a convolution kernel of f × 1 × 1 and a fourth 3D convolution layer with a convolution kernel of 1 × k × k in a residual error manner, the third 3D convolution layer with a convolution kernel of f × 1 × 1 performs time domain feature learning, the fourth 3D convolution layer with a convolution kernel of 1 × k × k performs spatial domain feature learning, and residual error decomposition has the following three advantages:

(1) the number of parameters of the network is greatly reduced, and when the composite number of the convolution kernels is n, the parameters of a single decomposed 3D convolution layer are (f multiplied by k)²) X n is changed to (f + k)²)×n；

(2) The third 3D convolutional layer of the f multiplied by 1 increases the nonlinearity of the network, and is beneficial to improving the expression capacity of the network;

(3) the residual connection mode can accelerate the training process of the network.

In addition, the multi-frame effective information input to the multi-frame complementary information extraction network contains complementary information, the effective information of each frame needs to be extracted from the degradation interference, the 3D convolutional layer has the characteristic of learning the characteristics of a plurality of frames of effective information and can effectively extract the characteristic information of each frame, on the basis of the characteristic, in the embodiment, the second degraded image sequence is used as input and is input into a multi-frame complementary information extraction network, when encoding, the image feature will be down sampled by the third maximum pooling layer every time it passes through a first residual decomposition block, in decoding, image features are first up-sampled by a 3D transposed convolutional layer, then input into a first residual decomposition block, the third 3D convolutional layer carries out pixel level correction on the input second degraded image sequence, and the fourth 3D convolutional layer fuses the characteristics of the last first residual decomposition block and outputs a 2D single-channel characteristic diagram.

In addition, because there is difference between the effective information of the input multiframe, it needs to give higher attention to the important features, it needs to be noted that, because of the high randomness of the atmospheric turbulence, in the actually collected space target clear image, the frames at some time may be much clearer than the frames at other time, based on this, a space-time attention mechanism is introduced, on one hand, the important features in the input multiframe can be enhanced, on the other hand, the clear features of such "extra" clear space target clear image can be utilized, the space-time attention mechanism includes 3 parts as follows:

s51, a frame attention process, namely, inputting the second degraded image sequence as an original feature into a 3D global tie pooling process and a 3D global maximum pooling process to obtain a first feature vector and a second feature vector, respectively transmitting the first feature vector and the second feature vector into different full-connection layers, adding outputs of the two different full-connection layers and multiplying the outputs of the two different full-connection layers by the original feature of multiple frames to obtain the attention feature of the multiple frames;

s52, a channel attention process, namely, inputting the second degraded image sequence as an original feature into a 2D global tie pooling process and a 2D global maximum pooling process to obtain a first one-dimensional vector and a second one-dimensional vector, respectively transmitting the first one-dimensional vector and the second one-dimensional vector into different full-connection layers, adding outputs of the two different full-connection layers and multiplying the outputs by a multi-frame attention feature to obtain a channel attention feature;

s53, a space attention process, wherein the channel attention characteristics are subjected to averaging and maximum value operation along a space dimension to obtain two single-channel characteristic graphs which are a space average value graph and a space maximum value graph respectively;

In addition, the single-channel feature maps of the output results of the spatial attention map and the multi-frame complementary information extraction network are added to be used as an information refinement network, when the information refinement network is coded, the image features enter corresponding downsampling layers every time when passing through a third residual decomposition block, when the information refinement network is decoded, the image features enter corresponding fourth residual deblocking every time passing through an upsampling layer, convolution kernels of a third 2D convolution layer and a fourth 2D convolution layer are (1,1) and are used for correcting the input image features, the final output of the information refinement network is a restored image with the same frame size as an input frame, and is also the final output of the whole system, and the reason for introducing the information refinement network is that:

(1) when extracting information, a multi-frame complementary information extraction network and a space-time attention mechanism inevitably carry some degradation information, so that an output result needs to be further refined;

(2) the sequencing result of the ERnet neural network model cannot guarantee that all input sequences are completely ordered, and errors caused by the sequencing result need to be compensated.

Step six, inputting a fuzzy target image by a user to generate a clear target image;

in this embodiment, an actual measurement is performed on the ranking effect of the ERnet neural network model (for convenience of description, in all tables below, the ERnet neural network model is referred to as RankNet, residual decomposition blocks in all network models are referred to as RDB, information refinement network is referred to as RN, and space-time attention mechanism is referred to as STAM), when in test, 1000 groups of second degraded image sequences are generated according to the sequence of the degradation degrees from small to large, the frame sequence of each group is preset to be {1,2,3,4,5,6,7}, calculating the ranking effect of the ERnet neural network model by the method of the step four, and testing the ranking effect of four classical non-reference image evaluation methods of variance, brenner, laplacian and tenegrad on the 1000 groups of first degraded image sequences, in the following table, the average of the weighted kender distances obtained under the five sorting methods is shown, specifically as follows:

Methods	M_RN
		laplacian	0.0913
tenengrad	0.1852
		variance	0.6704
brenner	0.6841
		RankNet	0.7123

in the following table, 5 sets of output sequences of the input first degraded image sequence under each sorting method are listed, specifically as follows:

in this example, an actual measurement of the image restoration effect was also performed, and the following table terminates the results of the ablation study, as follows:

Variations	Model₁	Model₂	Model₃	Model₄	Model₅
						RDB		√	√	√	√
RankNet			√	√	√
						RN				√	√
STAM					√
						PSNR	25.9156	25.7577	26.0580	26.5717	26.9423
SSIM	0.9336	0.9275	0.9369	0.9489	0.9558
						Parameters	10.0828M	7.4242M	7.4242M	19.8844M	20.1122M

it can be seen from the above table that the difference in the expression ability of the multi-frame complementary information extraction network, the space-time attention mechanism and the information refinement network is not large after the time difference decomposition block is used, and a certain model performance is sacrificed, but the parameters of the model are significantly less, higher calculation efficiency and potential generalization ability can be obtained, after the ERnet neural network model is added, the performance of the whole model is significantly improved, the expression ability of the model can be improved by sorting the input multi-frame effective information, the complexity of the data pattern can be reduced by explaining the sorting, the network performance is improved, after the information refinement network is added, the performance of the ERnet neural network model is significantly improved, the expression ability of the ERnet neural network model is stronger, better features can be extracted, the image reconstruction is completed, the important features in the input multi-frame image features can be effectively extracted by the space-time attention mechanism, the image characteristics extracted by the multi-frame complementary information extraction network are effectively compensated, and the same effect is very obvious from the result.

Fig. 4 shows the recovery effect of the ERnet neural network model on 32 × 32patch, and it can be seen from fig. 4 that each frame image undergoes severe degradation, and the first 7 frame images are the result after being sorted by the ERnet neural network model, and it can be seen that the degradation degree is more and more severe, and after passing through the ERnet neural network model, the texture features and the structural features in the image are recovered to a great extent.

Because the spatial target image is mainly influenced by two degradation factors of noise and blurring, the restoration difficulty is high, in order to better analyze and evaluate the performance of the algorithm, 5 representative baseline methods in the current field are selected for experimental comparison, Aittalaetatal is a method for simultaneously inputting multi-frame 2D images into a neural network for image restoration, SimHetal is a method for taking multi-frame as input, each pixel learns an adaptive kernel, fastMBD is a traditional multi-frame deblurring method, CBDnet is a method capable of well restoring actual degraded images, and DMPHN is a deep hierarchical stack multi-block network for image deblurring.

In order to demonstrate fairness in the comparative experiments, all multi-frame methods use the same test sequence as the ERnet neural network model, and for the single-frame method, the minimum degree of the first degraded image is used as the test data, i.e. r is 0.01 and λ is 35, the baseline method to be trained is retrained on the same training set by using the source code and parameters provided by the author, and the following table lists the average PSNR and SSIM values of the present invention on the test set with the other five baseline methods as follows:

method	PSNR	SSIM
			DMPHN	25.4006	0.9308
CBDnet	25.2954	0.9226
			fastMBD	17.2004	0.7125
SimHetal	26.2736	0.9456
			Aittalaetal	26.0386	0.9468
RankNet	26.9423	0.9558

as can be seen from the above table, in the case of severe degradation, the conventional regularization-based construction of the objective function, the SimHetal method constructs a layered network model, meanwhile, each pixel learns the self-adaptive fuzzy kernel, so that the problem of fuzzy inconsistency can be partially solved, the result of the ranking 2 is obtained, the Aittalaetal method extracts the significant information among multiple frames through maximum pooling operation, and also obtains better effect, especially, SSIM (structural similarity) is more prominent, DMPHN and CBDnet do not fully utilize interframe complementary information due to limited network model expression capability, and the result is worse, in fig. 5 we list the restoration effect of some images in the test set, and as shown in fig. 5, the present invention can restore not only clearer images but also richer texture information.

In the embodiment, an ERnet neural network model is also applied to the motion blur reconstruction, the result shows that the reconstruction difficulty of the motion blur is smaller than the reconstruction of the turbulence blur degradation, and the experimental result shows that the invention can also well apply the motion blur reconstruction, in the embodiment, degraded images with different angles and blur degrees are simulated on Urban100, Set5, Set14 and T91 data sets, the different angles can simulate the direction of the motion of an object or the direction of the shaking of a camera, the different blur degrees can simulate the degradation degree of the motion blur, in the experimental process, the angle range of the motion blur is 40-50 degrees, the size of a blur kernel is 15-30 pixels, angles are randomly generated in two ranges and the blur kernel clear image is degraded to generate training data, three novel recovery methods of DMPHN, Aittalaetatal and SimHetal are selected for comparison, and the three methods show that the performance of the baseline level is obtained on the motion blur, the test set used is BSD100, as shown in the following table:

	DMPHN	Aittala	SimH	RankNet
					SSIM	0.8948	0.9158	0.9315	0.9460
PSNR	25.1321	26.5183	27.3587	29.4329

the above table, in conjunction with fig. 6-7, shows that the present invention is significantly better than the comparative method in terms of motion blur, and the PSNR is improved by nearly 8% compared to the method ranked 2.

The above is an embodiment of the present invention. The specific parameters in the above embodiments and examples are only for the purpose of clearly illustrating the invention verification process of the inventor and are not intended to limit the scope of the invention, which is defined by the claims, and all equivalent structural changes made by using the contents of the specification and the drawings of the present invention should be covered by the scope of the present invention.

Claims

1. A multi-frame image deblurring system based on an ER network is characterized in that: comprises that

2. The ER network-based multi-frame image deblurring system according to claim 1, wherein: the sharpening processing module comprises

3. A multi-frame image deblurring method based on an ER network adopts the multi-frame image deblurring system based on the ER network, and is characterized in that: comprises that

Step one, constructing a degradation model, comprising the following steps:

PSF＝exp{-3.44(αfU/r)^5/3}；

s12, the degradation model is expressed as:

y＝T(x*PSF+N)；

4. The ER network-based multi-frame image deblurring method according to claim 3, characterized in that: in the third step, the method for generating the first degraded image specifically includes

5. The ER network-based multi-frame image deblurring method according to claim 3, characterized in that: in the fourth step, a first degraded image of a single frame is input into the ERnet neural network model, the first degraded image passes through the first 2D convolution layer, the first maximum pooling layer and the first flatten layer in sequence, a degraded characteristic vector is output, the degraded characteristic vector is input into a degraded image generator, and a second degraded image is output; and the first degraded image passes through the second 2D convolutional layer, the second maximum pooling layer and the second flatten layer in sequence, clear feature vectors are output, the clear feature vectors are input into a clear image generator, clear images corresponding to the second degraded image are output, and finally a second image sequence is generated.

6. The ER network-based multi-frame image deblurring method according to claim 4, wherein: in the fourth step, K groups of first degraded image sequences are input into the ERnet neural network model frame by frame to obtain a degraded characteristic vector group dvec and a clear characteristic vector group svec, a total frame of each group of input degraded image sequences is preset, and the first degraded image sequences are used

The formula is as follows:

wherein L is₂Representing the Euclidean distance function, and denoting the order of the input first degraded image sequence as rank_inpRecording the ranking result of the ERnet neural network model as rank_outpMeasuring the difference between the sequence of the input first degraded image sequence and the sequencing result of the ERnet neural network model by using the weighted Kendel distance, taking the average value of the weighted Kendel distances as the evaluation result of the ERnet neural network model, and recording the evaluation result as M_RNThe formula is as follows:

where τ is a weighted Kendel distance function.

7. The ER network-based multi-frame image deblurring method according to claim 3, characterized in that: the multi-frame complementary information extraction network is formed by combining a third 3D convolutional layer with a convolutional kernel of f multiplied by 1 and a fourth 3D convolutional layer with a convolutional kernel of 1 multiplied by k in a residual error mode, wherein the third 3D convolutional layer with the convolutional kernel of f multiplied by 1 is used for learning time domain features, and the fourth 3D convolutional layer with the convolutional kernel of 1 multiplied by k is used for learning space domain characteristics.

8. The ER network-based multi-frame image deblurring method according to claim 3, characterized in that: and inputting the second degraded image sequence as input into a multi-frame complementary information extraction network, wherein during coding, the image characteristics are subjected to down-sampling once by a third maximum pooling layer every time the image characteristics pass through a first residual decomposition block, during decoding, the image characteristics are subjected to up-sampling by a 3D transposition convolution layer and then are input into the first residual decomposition block, the third 3D convolution layer performs pixel level correction on the input second degraded image sequence, and the fourth 3D convolution layer fuses the characteristics of the last first residual decomposition block and outputs a 2D single-channel characteristic diagram.

9. The ER network-based multi-frame image deblurring method according to claim 3, characterized in that: the process of the space-time attention mechanism comprises

10. The ER network-based multi-frame image deblurring method according to claim 3, characterized in that: the output results of the multi-frame complementary information extraction network and the space-time attention mechanism are added to be used as input data of an information refinement network, when the information refinement network is used for coding, image features enter a corresponding down-sampling layer every time the image features pass through a third residual decomposition block, when the information refinement network is used for decoding, the image features enter a corresponding fourth residual decomposition block every time the image features pass through an up-sampling layer, convolution kernels of a third 2D convolution layer and a fourth 2D convolution layer are (1,1), and the convolution kernels are used for correcting the input image features.