CN115526792A

CN115526792A - Point spread function prior-based coding imaging reconstruction method

Info

Publication number: CN115526792A
Application number: CN202211077821.6A
Authority: CN
Inventors: 张闻文; 张颖; 何伟基; 陈钱; 顾国华
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-09-05
Filing date: 2022-09-05
Publication date: 2022-12-27

Abstract

The invention discloses a point spread function prior-based coding imaging reconstruction method, which comprises two stages: (1) In a trainable inversion stage, a measured fuzzy coding image is mapped to an intermediate reconstruction space after wiener filtering through learning a priori parameters in a forward model of a coding system, and preliminary decoding is completed; (2) And in the artifact correction stage, wavelet transformation is introduced by utilizing an improved U-Net structure to complete multi-level frequency domain filtering, so that residual artifacts of the intermediate reconstructed image are eliminated, and the visual perception quality is improved. Aiming at the problems of long time consumption and low definition of image reconstruction in lens-free coding imaging, the invention builds a light-weight deep learning convolutional neural network based on a physical model, effectively reduces the time of network training and image reconstruction with less memory requirement and higher convergence rate, and simultaneously improves the reconstruction quality of a coded image.

Description

Point spread function prior-based coding imaging reconstruction method

Technical Field

The invention belongs to a lens-free coding imaging technology, and particularly relates to a point spread function prior-based coding imaging reconstruction method.

Background

The lens-free coding imaging technology adopts a single-chip coding mask to replace a complex optical component in a traditional camera to code scene light, such as a diffractive optical element, a coding aperture and the like, and completes inversion of an optical imaging process through a computational imaging technology so as to reconstruct an image of a target scene. The method transfers the main imaging pressure from the front-end optical imaging equipment to the rear-end calculation reconstruction technology, avoids the alignment, integration and process problems of complex lens groups in the traditional imaging system, obviously reduces the thickness, weight and cost of the system, provides a reasonable and feasible realization idea for light and thin imaging systems such as miniature cameras and the like, and has great demands in the fields of safety, wearable equipment, implantable equipment, sensor networks of the Internet of things and the like.

The core idea of the image reconstruction method based on coding mask imaging is to regularly regulate and control a light field through a designed mask and obtain a clear target scene from a fuzzy unfocused pattern by combining an image reconstruction algorithm in a computing system. At present, most of relevant research on the coding mask imaging technology is based on research on the structure, the imaging model and the application scene of the coding mask, and problems of artifacts, loss of details and the like exist in a refocused image obtained by a back-end scene reconstruction part. Therefore, it is necessary to develop a related research to exploit the potential of high-quality reconstruction of the back-end image of the coded mask imaging system, and improve the overall imaging performance of the system.

Currently, some back-end reconstruction algorithms already exist and can be classified into two categories: traditional iterative optimization algorithms and deep learning algorithms based on neural networks. The Ashok vector algorithm has proposed a flutcam system, which uses an amplitude modulated separable Coded mask Imaging system, and SVD, BM3D, TV based algorithms achieve a 512 × 512 visible image reconstruction, but the reconstructed image has poor quality and simple reconstruction targets (1.asif, m.s., ayremou, a., sankararayanan, a., veeraragan, a., & bannikamiuk, r.g. (2017). Flutcam: thin, less Cameras Using Coded adaptation and computation. Ieee Transactions on Computational Imaging,3 (3), 384-397.). Jiachen Wu et al uses fresnel zone apertures to encode incoherent light into the form of a wavefront and uses a compressive sensing algorithm to effectively eliminate double image artifacts due to sparsity in natural scenes. The method has the advantages that the signal-to-noise ratio of a single-lens image is obviously improved, the development of a camera architecture which is flat and reliable in structure and does not need strict calibration is promoted, but the application of the traditional algorithm is still limited by the slower reconstruction speed (2. Wu J, zhang H, zhang W, et al. Image reconstruction methods based on deep learning are increasingly popular due to their excellent reconstruction effect. However, compared to the traditional iterative method, the method based on deep learning is difficult to interpret, and there is no structured method to integrate the knowledge of the imaging system. The unfolding optimization then represents an intermediate zone between the classical and the in-depth approach. In the unfolding optimization, the fixed number of iterations of the classical algorithm is interpreted as a deep network, each iteration being one layer in the network. Kristinamokhova et al developed an alternating direction multiplier Algorithm (ADMM) study for lensless imaging. They proposed several network variations along the spectrum between the classical method and the depth method by varying the number of trainable parameters, including Le-ADMM, le-ADMM x, and Le-ADMM-U. The network trades off data fidelity against image perceived quality to produce a more visually appealing image at the cost of reduced data fidelity, but the method constraints are complex and image detail can be overwhelmed by artifacts (3. Kristina monakhova, joshua yuretscher, grace kuo, nickel antipa, kyrolos yanny, and Laura Waller, "raw orientations for reactive mask-based lens identification," op.express 27,28075-28090 (2019)).

Disclosure of Invention

The invention aims to provide a point spread function prior-based coded imaging reconstruction method, which aims to solve the problems of low reconstruction speed, low reconstruction precision and artifact residue of a deep learning method in the traditional method in lens-free coded imaging, effectively reduce the time of network training and image reconstruction with less memory requirement and higher convergence speed, and improve the reconstruction quality of a coded image.

The technical scheme for realizing the purpose of the invention is as follows: a point spread function prior-based coding imaging reconstruction method specifically comprises the following steps:

step 1: simulating or collecting a set of target data sets without loss as reference images;

and 2, step: simulating or collecting a group of coding image data sets based on a lens-free coding imaging system, generating a training data pair with a specified size, and calculating a point spread function of a coding mask with a corresponding size;

and 3, step 3: constructing a reconstruction network, wherein the reconstruction network adopts a convolution neural network based on point spread function prior, and the reconstruction network consists of two parts: the image correction method comprises a wiener filtering inversion part based on point diffusion function prior and an artifact correction part based on a wavelet convolution neural network, wherein the point diffusion function with a specified size is used as learnable prior information and is input into a filtering kernel of the wiener filtering inversion part;

and 4, step 4: constructing a loss function of a reconstructed network: calculating the error between the network output result and the target image by adopting a negative Pearson correlation coefficient, wherein a loss function is defined as the quotient of the product of the covariance between two variables and the standard deviation of the two variables;

and 5: optimizing the wavelet convolution neural network by adopting an Adam optimizer, setting an initial learning rate of the optimization algorithm, multiplying the training completion of each period by an attenuation factor, and setting an exponential attenuation rate of first-order moment estimation, an exponential attenuation rate of second-order moment estimation and iteration times of each period;

and 6: training the network according to the set hyper-parameters for b periods, and finishing the training in two times: the first b/2 periods are fixed with wiener filtering kernels, namely, a wiener filtering inversion part does not participate in back propagation, and only a wavelet convolution neural network module is trained; after the training of the first b/2 periods is finished, the network reaches a preliminary convergence state, then the training of the next b/2 periods is carried out, and the wiener filtering module is brought into a back propagation process in the iteration of the next b/2 periods, namely the parameters of the two modules are trained simultaneously, wherein b is an even number;

and 7: and inputting the coded images of the test set into a network for prediction, and outputting reconstructed decoded images.

Compared with the prior art, the invention has the following remarkable advantages: (1) The method comprises the steps of adopting a wiener filtering fast reconstruction method based on the smallest second-order problem under the Tikhonov regularization to realize preliminary decoding of a coded image, fully utilizing prior physical information, establishing an inversion physical model, and incorporating the physical model into a deep learning network learning framework to enable the reconstruction process to have interpretability; (2) For the intermediate reconstructed image of the initial decoding, introducing wavelet transform to replace each pooling operation, and expanding the receptive field under the condition of not losing information; (3) The network only comprises 32 convolutional layers, thereby greatly reducing the parameter number of the network, better removing image artifacts while improving the running speed, keeping the details of the image and improving the reconstruction quality.

Drawings

FIG. 1 is a schematic diagram of a system model for validating the present invention.

FIG. 2 is a schematic diagram of the modulation process of the light by the encoding mask in the present invention.

FIG. 3 is a schematic diagram of the structure of the wavelet convolutional neural network in the present invention.

FIG. 4 is a schematic diagram of the overall structure of the method of the present invention.

FIG. 5 is a comparison graph of image effects reconstructed by the point spread function prior-based encoding imaging reconstruction method of the present invention.

Detailed Description

The present invention is described in further detail below with reference to the attached drawing figures.

A point spread function prior-based coding imaging reconstruction method specifically comprises the following steps:

step 1: a group of target data sets without loss are collected to be used as reference images, high-definition data sets disclosed on a network can be used, and the target data sets can be collected by self.

Step 2: a lensless coded imaging system was built. FIG. 1 is a schematic diagram of a system model for verifying the present invention. The target scene is a lossless image displayed on a display screen, the distance between the display screen and the coding mask is about 30cm, the distance between the coding mask and the sensor is about 3mm, the coding mask replaces a traditional lens group and a sensor module to assemble a set of lens-free camera system, the verification system adopts a Fresnel zone plate as a reference coding mask, the diameter of the zone plate is about 4.5mm, and the Fresnel constant is 0.325mm.

A group of encoding image data sets are simulated or collected based on a lens-free encoding imaging system, training data pairs with specified sizes are generated, and point spread functions of encoding masks with corresponding sizes are measured. For the directly acquired data set, taking a lossless image on a display screen by using a lens-free coding imaging system under the parameter condition; for a simulation dataset, a strict forward propagation model needs to be established to simulate the encoded image captured on the sensor, and the specific simulation process is as follows.

The principle of lens-free coding imaging is as follows: in fourier optics, the formation of an incoherent image can be viewed as a collection of point sources, each point source will produce a shifting effect of the point spread function, and since the sources are incoherent with each other, the shifted point spread function will increase in intensity linearly at the sensor, and the detected image can be represented as a convolution model of the target image and the system point spread function. The modulation effect of the encoding mask on incident light in the present invention is expressed in the form of a point spread function, as shown in fig. 2, where the intensity pattern of the point spread function varies with the variation of the diffraction distance.

For sensor imaging, the generation of the simulated encoded image dataset is:

Y＝C(PSF _z *X+N)

where Y is the simulated image-plane encoded image, C is the crop operator, PSF _z Is the point spread function of the coding mask captured on the outgoing light field at z from the target, X is the input lossless target image, N is additive noiseA convolution operator. For a broadband light source, the encoded image can be calculated by integrating the diffracted intensity at multiple wavelengths. The imaging model also takes into account specific spectral response curves for a particular sensor, since the image sensor has different sensitivities to light of different wavelengths, and therefore the integration should pass through the spectral response Q _c (λ) weighting:

wherein [ lambda ] _min ,λ _max ]Indicating spectral range, PSF _z (λ) is the point spread function for monochromatic light of wavelength λ, X (λ) is the intensity of the light of wavelength λ emitted by the screen, Q _c Is the spectral response curve of the sensor, eta is the readout noise of the sensor, and Gaussian noise eta-N (0, sigma) is generally adopted ² ),Y _c Is a coded image captured by an analog sensor.

And step 3: constructing a reconstruction network, wherein the reconstruction network adopts a convolution neural network based on point spread function prior, and the reconstruction network consists of two parts: a wiener filtering inversion part based on point spread function prior and an artifact correction part based on a wavelet convolution neural network.

(1) The specific method of the wiener filtering inversion part is as follows:

the imaging model of a mask-based lensless imaging system is typically characterized by a convolution of the scene with the mask shadow, i.e., a point spread function (point spread function):

y＝p*x+e

where p is the point spread function, x is the scene irradiance, y is the image formed on the sensor, and e is the measurement noise. To reconstruct x from y using the known p, a general image recovery method for lensless imaging is to minimize an objective function that typically consists of a data fidelity term and a regularization term:

wherein

Quantization data fidelity, regularization term

Prior knowledge is introduced to alleviate the ill-qualification of the inverse problem, the regularization parameter gamma controls the relative weights of the two terms,

representing Tikhonov regularization. The least squares problem under Tikhonov regularization has a closed-form solution given by wiener deconvolution:

the form of the trainable inversion stage in the convolution case behaves as a learned inversion of the Hadamard product in the fourier domain:

X _interm ＝F ^-1 (F(W)⊙F(Y))

wherein, X _interm Is the output of this stage, Y is the measurement, F (-), and F ^-1 (. Is a Fourier transform and inverse Fourier transform operation, W is a filter learned by the neural network, and as such, indicates a Hadamard product. For a measurement where one dimension is nxm, the dimension of W is also nxm. W completes initialization using fourier transform of the calibrated point spread function, i.e.:

where K is a regularization parameter and the initial value is set to 10 ⁴ H = F (p), p being the point spread function prior of the input, ^* representing the conjugate operator.

In the reconstruction network, a point spread function of a specified size is input as learnable a priori information into a filter kernel of a wiener filter module.

(2) The wavelet convolutional neural network adopts a U-Net architecture and consists of 4-layer wavelet transform, 32 volume blocks and 4-layer wavelet inverse transform, wherein each volume block comprises an optional BN process and a Relu activation process. . Since the wavelet transform is invertible, this down-sampling scheme can ensure that all information is preserved. In addition, the wavelet transformation can simultaneously capture the frequency and position information of the characteristic diagram, and has better time-frequency localization characteristic and detail retention capability. The wavelet transformation enhances the learning of the network to the high-frequency information and the low-frequency information, and is beneficial to realizing artifact correction. The wavelet convolutional neural network can embed wavelet transformation into any convolutional neural network with pooling and has stronger capability of modeling the spatial context and intersubband dependency.

Specifically, the wavelet convolution neural network is composed of an encoder subnetwork and a decoder subnetwork, and the two parts have a symmetrical U-shaped structure. In an encoder subnetwork, completing characteristic diagram down-sampling through a wavelet transform layer, adding 4 convolution blocks in any 2 wavelet transform layers, and taking an output sub-band characteristic diagram of wavelet transform as the input of a subsequent convolution block; similarly, in the decoder subnetwork, the characteristic graph is up-sampled by inverse wavelet transform, 4 convolutional blocks are added in any 2 inverse wavelet transform layers, and the output subband characteristic graph of the inverse wavelet transform is used as the input of a subsequent convolutional layer; each convolution block consists of a 3 x 3 filter convolution, batch normalization and modified linear unit (ReLU activation function). For the last volume block, predicting a residual image by convolution without batch normalization and a ReLU activation function; in the up-sampling process, the feature maps of the encoder sub-network and the decoder sub-network are fused by adopting a method of element-by-element summation.

The wavelet transformation is realized by the following method:

taking Haar wavelet as an example, in a two-dimensional Haar wavelet, the definition of the low-pass filter is:

it can be seen that，

What is actually achieved is a summation pooling operation. When only the low frequency sub-band is considered, the wavelet transform and the inverse wavelet transform play a role in pooling and up-convolution, respectively, in the network. When all sub-bands are considered, the network can avoid information loss caused by conventional sub-band sampling, and the recovery result is facilitated. In reconstructing the network, f is also used _LH ,f _HL ,f _HH Three subband filters, defined as:

given an image x with an image size of m × n, the (i, j) th value x of 4 sub-band diagrams of the image after 2-dimensional Haar transform (wavelet transform) _k (i, j) (k =1,2,3, 4) are written as:

meanwhile, the inverse wavelet transform process can be obtained as follows:

the image size used for training is NxNx1, the output inversion graph size of the input image after passing through the wiener filtering module is still NxNx1, the wavelet convolution neural network carries out 4-layer down-sampling characteristic learning operation and 4-layer up-sampling characteristic learning operation on the intermediate reconstructed image, wherein the specific structure of the encoder sub-network is as follows:

a first feature layer: the input feature map is subjected to down-sampling by the first-layer wavelet transform, the number of output channels is 4, and then feature learning is completed by 4 convolution blocks. Each volume block includes: (1) Convolution layer, convolution kernel size is 3 x 3, output channel number is 40, step length is 1, filling is 1; (2) batch normalization layer, the number of output channels is 40; (3) an activation layer, activated using a ReLU function; wherein the number of the input channels of the first convolution layer is 4, and the number of the input channels of the second convolution layer to the fourth convolution layer is 40.

A second characteristic layer: the output of the first characteristic layer is subjected to wavelet transform of the second layer to complete down-sampling, the number of output channels is 160, and then characteristic learning is completed through 4 convolution blocks. Each volume block includes: (1) A convolution layer, the size of a convolution kernel is 3 multiplied by 3, the number of output channels is 64, the step length is 1, and the filling is 1; (2) batch normalization layer, the number of output channels is 64; (3) an activation layer, activated using a ReLU function; wherein, the number of the input channels of the first convolution layer is 160, and the number of the input channels of the second to the fourth convolution layers is 64.

A third characteristic layer: the output of the second characteristic layer is subjected to down-sampling by the wavelet transform of the third layer, the number of output channels is 256, and then the characteristic learning is finished by 4 convolution blocks. Each volume block includes: (1) Convolution layer, convolution kernel size is 3 x 3, output channel number is 64, step length is 1, filling is 1; (2) batch normalization layer, the number of output channels is 64; (3) an activation layer, activated using a ReLU function; the number of input channels of the first convolutional layer is 256, and the number of input channels of the second convolutional layer to the fourth convolutional layer is 64.

A fourth feature layer: the output of the third characteristic layer is subjected to down-sampling by wavelet transform of a fourth layer, the number of output channels is 256, and then characteristic learning is finished by 4 convolution blocks. Each volume block includes: (1) A convolution layer, the size of a convolution kernel is 3 multiplied by 3, the number of output channels is 64, the step length is 1, and the filling is 1; (2) batch normalization layer, the number of output channels is 64; (3) an activation layer, activated using a ReLU function; the number of input channels of the first convolutional layer is 256, and the number of input channels of the second convolutional layer to the fourth convolutional layer is 64.

The specific structure of the decoder subnetwork is:

a fourth feature layer: the output of the fourth feature layer of the encoder is used as input, and feature learning is firstly completed through 4 rolling blocks, and each rolling block comprises: (1) Convolution layer, convolution kernel size is 3 x 3, number is 64, step length is 1, filling is 1; (2) batch normalization layer; (3) an activation layer, activated using a ReLU function; the number of output channels of the first to third convolution layers and the batch normalization layer is 64, and the number of output channels of the fourth convolution layer and the batch normalization layer is 256; then, the up-sampling is completed through the wavelet inverse transformation of the fourth layer, and the number of output channels is 64;

a third characteristic layer: the output of the third feature layer of the encoder is added to the output of the fourth feature layer of the decoder, and as the input of the third feature layer of the decoder, feature learning is completed through 4 convolution blocks, and each convolution block comprises: (1) Convolution layer, convolution kernel size is 3 x 3, input channel number is 64, step length is 1, filling is 1; (2) batch normalization layer; (3) an activation layer, activated using a ReLU function; the number of output channels of the first to third convolution layers and the batch normalization layer is 64, and the number of output channels of the fourth convolution layer and the batch normalization layer is 256; then, the up-sampling is completed through the wavelet inverse transformation of the third layer, and the number of output channels is 64;

a second characteristic layer: the output of the second characteristic layer of the coder is added with the output of the third characteristic layer of the decoder to be used as the input of the second characteristic layer of the decoder, and firstly, the feature learning is completed through 4 convolution blocks, wherein each convolution block comprises: (1) Convolution layer, convolution kernel size is 3 x 3, input channel number is 64, step length is 1, filling is 1; (2) batch normalization layer; (3) an activation layer, activated using a ReLU function; the number of output channels of the first to third convolution layers and the batch normalization layer is 64, and the number of output channels of the fourth convolution layer and the batch normalization layer is 160; then, the up-sampling is completed through the two-layer wavelet inverse transformation, and the number of output channels is 40;

a first feature layer: the output of the first feature layer of the encoder is added with the output of the second feature layer of the decoder, and the added output is used as the input of the first feature layer of the decoder, and feature learning is completed through 3 convolution blocks, wherein each convolution block comprises: (1) Convolution layer, convolution kernel size is 3 x 3, input channel number is 40, output channel number is 40, step length is 1, filling is 1; (2) batch normalization layer, the number of output channels is 40; (3) an activation layer, activated using a ReLU function; then, the convolution kernel size is 3 multiplied by 3, the number of input channels is 40, the number of output channels is 4, the step length is 1 and the filling is 1 through a fourth convolution layer; finally, completing up-sampling through the first layer of wavelet inverse transformation, wherein the number of output channels is 1;

a zeroth feature layer: the input of the encoder is added to the output of the first feature layer of the decoder to obtain the final output.

And 4, step 4: establishing a loss function of a reconstructed network, namely recording an input coding picture of the network as X, recording a corresponding lossless target image as Y and recording an output reconstructed image of the network as T; and calculating the error between the network output result and the target image by adopting a negative Pearson correlation coefficient, wherein the loss function is defined as the quotient of the covariance between the two variables and the product of the standard deviation of the two variables:

wherein n is the number of pixels of the input picture, and i traverses the gray value corresponding to each pixel point. The value range of the loss function is [ -1,0], and the closer to-1, the higher the correlation degree is, the better the image reconstruction effect is;

and 5: the whole network is optimized by adopting an Adam (Adaptive Moment Estimate) optimizer, the initial learning rate lr of the optimization algorithm is set to be 0.0005, each period is multiplied by an attenuation factor of 0.8 after training is completed, the exponential attenuation rate of the first Moment Estimate is 0.9, and the exponential attenuation rate of the second Moment Estimate is 0.999; creating 3200 pairs of samples, wherein the number of batch processing pictures is 4, and each period finishes 800 iterations;

step 6: fig. 4 is a schematic diagram of the overall structure of the method of the present invention, and in combination with the diagram, the network is trained according to the set hyper-parameters, and the training is completed in 40 periods in two times: the first 20 periods are fixed with wiener filtering kernels, namely the wiener filtering module does not participate in back propagation, and only the wavelet convolution neural network module is trained; after the 20-period training is finished, the network reaches a preliminary convergence state, the next 20-period training is carried out, and the wiener filtering module is brought into a back propagation process in the iteration, namely parameters of the two modules are trained simultaneously;

and 7: and storing the finally trained network model, inputting the coded images of the test set into a network for prediction, and outputting reconstructed decoded images.

FIG. 5 is a comparison graph of image effects reconstructed by the point spread function prior-based coding imaging reconstruction method of the present invention, and on a test set, the average reconstruction time of an image with a size of 1024 × 1024 is 0.033s and the average reconstruction peak signal-to-noise ratio is 22.15dB by using the method.

Claims

1.A point spread function prior-based coding imaging reconstruction method is characterized by comprising the following specific steps:

and step 3: constructing a reconstruction network, wherein the reconstruction network adopts a convolutional neural network based on point spread function prior, and the reconstruction network consists of two parts: the image correction method comprises a wiener filtering inversion part based on point spread function prior and an artifact correction part based on a wavelet convolution neural network, wherein the point spread function with a specified size is used as learnable prior information and is input into a filtering kernel of the wiener filtering inversion part;

and 5: optimizing the wavelet convolution neural network by adopting an Adam optimizer, setting an initial learning rate of the optimization algorithm, multiplying an attenuation factor after each period of training is finished, and setting an exponential attenuation rate of first-order moment estimation, an exponential attenuation rate of second-order moment estimation and iteration times of each period;

step 6: training the network according to the set hyper-parameters, wherein b periods are trained, and the training is completed in two times: the first b/2 periods are fixed with wiener filtering kernels, namely, a wiener filtering inversion part does not participate in back propagation, and only a wavelet convolution neural network module is trained; after the training of the first b/2 periods is finished, the network reaches a preliminary convergence state, then the training of the next b/2 periods is carried out, and the wiener filtering module is brought into a back propagation process in the iteration of the next b/2 periods, namely the parameters of the two modules are trained simultaneously, wherein b is an even number;

2. The point spread function prior-based coded imaging reconstruction method of claim 1, wherein: the lens-free coding imaging system comprises a display screen, a coding mask and an image acquisition device which are arranged on the same horizontal light path, wherein a lossless target scene is displayed on the display screen, the distance between the display screen and the coding mask is about 30cm, and the distance between the coding mask and the image acquisition device is 3mm.

3. The point spread function prior-based coded imaging reconstruction method of claim 1, wherein: the generation process of the simulated encoded image data set in step 2 is:

Y＝C(PSF _z *X+N)

where Y is the encoded image on the simulated image plane, C is the crop operator, PSF _z Is the point spread function of the encoded mask captured on the outgoing light field at a distance z from the target, X is the input lossless target image, N is additive noise, and X represents the convolution operator.

4. The point spread function prior-based coded imaging reconstruction method according to claim 1, wherein the specific process of the point spread function prior-based wiener filter inversion part in step 3 is as follows:

X _interm ＝F ^-1 (F(W)⊙F(Y))

wherein, X _interm Is the output of the wiener filter inversion section, Y is the measurement, F (-) and F ^-1 (. Is) the Fourier transform and inverse Fourier transform operations, respectively, W is the filter learned by the neural network, and & -refers to the Hadamard product; for a measurement where one dimension is N x M,the dimension of W is also N M; w completes initialization using fourier transform of the calibrated point spread function, i.e.:

where K is a regularization parameter, H = F (p), p is the point spread function prior of the input, ^* representing the conjugate operator.

5. The point spread function prior-based coded imaging reconstruction method of claim 1, wherein the wavelet convolutional neural network portion in step 3 is composed of an encoder subnetwork and a decoder subnetwork, and the two portions have a symmetrical U-shaped structure.

6. The point spread function prior based coded imaging reconstruction method of claim 5, wherein the encoder subnetwork includes 4 wavelet transform layers, each wavelet transform layer being followed by 4 convolution blocks, the feature map downsampling being performed by the wavelet transform layers, each convolution block consisting of 3 x 3 filter convolution, batch normalization and modified linear units.

7. The point spread function prior-based encoded imaging reconstruction method of claim 6, wherein the encoder subnetwork comprises:

a first feature layer: the input feature map is subjected to down-sampling through a first layer of wavelet transform, feature learning is completed through 4 convolution blocks, the number of output channels after down-sampling is 4, and each convolution block comprises: (1) Convolution layer, convolution kernel size is 3 x 3, output channel number is 40, step length is 1, filling is 1; (2) batch normalization layer, the number of output channels is 40; (3) an activation layer, activated using a ReLU function; wherein, the number of input channels of the first convolution layer is 4, and the number of input channels of the second convolution layer to the fourth convolution layer is 40;

a second characteristic layer: the output of the first characteristic layer is subjected to the second layer wavelet transform to complete the down-sampling, the characteristic learning is completed through 4 convolution blocks, the number of output channels after the down-sampling is 160, and each convolution block comprises: (1) Convolution layer, convolution kernel size is 3 x 3, output channel number is 64, step length is 1, filling is 1; (2) batch normalization layer, the number of output channels is 64; (3) an activation layer, activated using a ReLU function; wherein the number of the input channels of the first convolution layer is 160, and the number of the input channels of the second convolution layer to the fourth convolution layer is 64;

a third characteristic layer: the output of the second feature layer is subjected to down-sampling through wavelet transform of a third layer, feature learning is completed through 4 convolution blocks, the number of output channels after down-sampling is 256, and each convolution block comprises: (1) Convolution layer, convolution kernel size is 3 x 3, output channel number is 64, step length is 1, filling is 1; (2) batch normalization layer, the number of output channels is 64; (3) an activation layer, activated using a ReLU function; wherein, the number of input channels of the first convolution layer is 256, and the number of input channels of the second convolution layer to the fourth convolution layer is 64;

a fourth feature layer: the output of the third feature layer is subjected to down-sampling through wavelet transform of a fourth layer, feature learning is completed through 4 convolution blocks, the number of output channels after down-sampling is 256, and each convolution block comprises: (1) Convolution layer, convolution kernel size is 3 x 3, output channel number is 64, step length is 1, filling is 1; (2) batch normalization layer, the number of output channels is 64; (3) an activation layer, activated using a ReLU function; the number of input channels of the first convolutional layer is 256, and the number of input channels of the second convolutional layer to the fourth convolutional layer is 64.

8. The point spread function prior-based coded imaging reconstruction method of claim 6, wherein said decoder subnetwork includes 4 inverse wavelet transform layers, each inverse wavelet transform layer followed by 4 convolution blocks; for the last volume block, predicting a residual image by convolution without batch normalization and a ReLU activation function; the inverse wavelet transform layer is used for completing feature map up-sampling, and in the up-sampling process, feature maps of an encoder sub-network and a decoder sub-network are fused by adopting a method of element-by-element summation.

9. The point spread function prior-based coded imaging reconstruction method of claim 6, wherein the decoder subnetwork comprises:

a fourth feature layer: the output of the fourth feature layer of the encoder is used as input, feature learning is completed through 4 convolution blocks, and each convolution block comprises: (1) Convolution layer, convolution kernel size is 3 x 3, number is 64, step length is 1, filling is 1; (2) batch normalization layer; (3) an activation layer, activated using a ReLU function; the number of output channels of the first to third convolution layers and the batch normalization layer is 64, and the number of output channels of the fourth convolution layer and the batch normalization layer is 256; then, the up-sampling is completed through the wavelet inverse transformation of the fourth layer, and the number of output channels is 64;

a third characteristic layer: the output of the third feature layer of the encoder is added with the output of the fourth feature layer of the decoder, and the added output is used as the input of the third feature layer of the decoder, feature learning is completed through 4 convolution blocks, and each convolution block comprises: (1) Convolution layer, convolution kernel size is 3 x 3, input channel number is 64, step length is 1, filling is 1; (2) batch normalization layer; (3) an activation layer, activated using a ReLU function; the number of output channels of the first to third convolution layers and the batch normalization layer is 64, and the number of output channels of the fourth convolution layer and the batch normalization layer is 256; then, the up-sampling is completed through the third layer of wavelet inverse transformation, and the number of output channels is 64;

a second feature layer: the output of the second characteristic layer of the coder is added with the output of the third characteristic layer of the decoder, and the added output is used as the input of the second characteristic layer of the decoder, and firstly feature learning is completed through 4 convolution blocks, wherein each convolution block comprises: (1) Convolution layer, convolution kernel size is 3 x 3, input channel number is 64, step length is 1, filling is 1; (2) batch normalization layer; (3) an activation layer, activated using a ReLU function; the number of output channels of the first to third convolution layers and the batch normalization layer is 64, and the number of output channels of the fourth convolution layer and the batch normalization layer is 160; then, the up-sampling is completed through two-layer wavelet inverse transformation, and the number of output channels is 40;

a first feature layer: the output of the first characteristic layer of the coder is added with the output of the second characteristic layer of the decoder, and the added output is used as the input of the first characteristic layer of the decoder, and firstly, the characteristic learning is completed through 3 convolution blocks, and each convolution block comprises: (1) Convolution layer, convolution kernel size is 3 x 3, input channel number is 40, output channel number is 40, step length is 1, filling is 1; (2) batch normalization layer, the number of output channels is 40; (3) an activation layer, activated using a ReLU function; then, the convolution kernel size is 3 multiplied by 3, the number of input channels is 40, the number of output channels is 4, the step length is 1 and the filling is 1 through a fourth convolution layer; finally, completing up-sampling through the first layer of wavelet inverse transformation, wherein the number of output channels is 1;

the zeroth characteristic layer: the input of the encoder is added to the output of the first feature layer of the decoder to obtain the final output.

10. The point spread function prior-based coding imaging reconstruction method according to claim 1, wherein the loss function constructed in step 4 is specifically:

wherein n is the number of pixels of the input picture, i traverses each pixel point, and T _i In order to reconstruct the gray value corresponding to the pixel point i in the image,

the table is the mean of all the gray values of the pixels of the reconstructed image, Y _i The gray value corresponding to the pixel point i in the lossless target image,

is the average value of all pixel gray values of the lossless target image.