CN116645432A

CN116645432A - High-quality hologram generating method based on improved ViT network

Info

Publication number: CN116645432A
Application number: CN202310665894.5A
Authority: CN
Inventors: 李燕; 凌玉烨; 徐超; 董振兴
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2023-06-07
Filing date: 2023-06-07
Publication date: 2023-08-25

Abstract

A high quality hologram generating method based on modified ViT network, by constructing encoding-decoding architecture, the modified Vision Transformer network is used as encoding part to encode the target image into its corresponding hologram; simulating free space propagation of light in the decoding part through an angular spectrum propagation algorithm to obtain a reconstructed image of the hologram, and performing iterative training on the encoding part of the encoding-decoding architecture by calculating a loss function between the reconstructed image and the target image; and reconstructing a high-quality holographic display image by using the pure phase hologram generated by the trained encoding-decoding architecture through a holographic display system in an online stage. The invention generates higher quality holograms and achieves holographic display by focusing on global information of the target image to improve the Vision transducer network.

Description

High-quality hologram generating method based on improved ViT network

Technical Field

The invention relates to a technology in the field of image processing, in particular to a high-quality hologram generating method based on an improved ViT (Vision Transformer) network.

Background

Existing Deep Neural Network (DNN) -based computer-generated hologram (CGH) algorithms calculate holograms by training one or more Convolutional Neural Networks (CNNs) and apply to holographic display systems, shortening the time to calculate high quality holograms, but are less time consuming iterative algorithms in terms of display quality than conventional ones. One important reason is that diffraction of light waves is a cross-domain process from spatial domain to frequency domain, with global characteristics, whereas CNNs typically use local convolution operations, with limited receptive fields, and it is difficult to learn a cross-domain mapping from a target map (spatial domain) to a hologram (frequency domain).

Disclosure of Invention

Aiming at the problem that the display quality of the generated hologram is relatively low in the conventional CNN-based computer-generated holography, the invention provides a high-quality hologram generating method based on an improved ViT network, which aims at the global information of a target image, generates a higher-quality hologram by an improved Vision Transformer network and realizes high-quality hologram display, solves the problem that the receptive field of the conventional CNN-based CGH algorithm is limited, and improves the display image quality in the holographic display.

The invention is realized by the following technical scheme:

the invention relates to a high-quality hologram generating method based on an improved ViT network, which aims at CGH tasks by constructing an encoding-decoding framework, improves Vision Transformer the network, and takes the improved ViT as an encoding part to encode a target image into a corresponding phase-only hologram; simulating free space propagation of light in the decoding part through an angular spectrum propagation algorithm to obtain a reconstructed image of the hologram, and performing iterative training on the encoding part of the encoding-decoding architecture by calculating a loss function between the reconstructed image and the target image; the pure phase hologram is generated by adopting the improved Vision Transformer network after training in the online stage, and a high-quality holographic display image is reconstructed through a holographic display system.

Technical effects

The invention utilizes a pre-trained modified Vision Transformer network to calculate the phase-only hologram of the target display image and to achieve holographic display. Compared with the existing method for calculating the phase-only hologram of the target display image by using CNN, the method utilizes the characteristic of capturing global features by using the improved Vision Transformer network, improves the quality of the network calculation hologram, and obviously improves the quality of the reconstructed image in holographic display.

Drawings

FIG. 1 is a diagram of a network training framework of the present invention;

FIG. 2 is a schematic diagram of an embodiment;

FIG. 3 is a schematic diagram of an optical display system;

fig. 4 is an effect diagram of the embodiment.

Detailed Description

As shown in fig. 1 (a), the present embodiment includes an encoding section and a decoding section for the modified Vision Transformer network training frame based on the high image quality hologram generating method of the modified ViT network. The coding part is an improved Vision Tranformer network, which is a U-shaped framework consisting of four downsampling modules and corresponding upsampling modules, wherein:

each downsampling module and the corresponding upsampling module comprise two global filtering blocks.

As shown in fig. 1 (b), the global filtering block includes: two layer normalization units, a global filtering layer and a local enhanced feed forward network (LeFF), wherein: the global filtering layer firstly converts the input spatial features into frequency domains through two-dimensional fast Fourier transform (2D FFT), filters the frequency domain features through a learnable global filter, and then converts the frequency domain feature map back into the spatial features through two-dimensional inverse fast Fourier transform (2D IFFT). The global filtering layer effectively improves the receptive field of the network and the operation speed, and the quality of the holographic display image realized by utilizing the hologram obtained by the trained network is obviously improved.

As shown in FIG. 1 (c)In the high-image-quality holographic display method based on ViT network, the decoding part of the network training process is an angular spectrum propagation model, and the free space propagation of light is simulated through an angular spectrum propagation algorithm to obtain a reconstructed image of the simulated hologram, wherein: the angular spectrum propagation method comprises the following steps:wherein: e, e ^iφ(x,y) For the complex amplitude distribution of the diffraction plane, +.>For the complex amplitude distribution of the image plane, f _x ,f _y Is the spatial frequency, λ is the wavelength, and z is the propagation distance. In this example, the wavelength was set to 543nm and the propagation distance was set to 7cm.

The loss function employed in the high quality hologram generation method training framework of the present embodiment based on the modified ViT network includes: mean Square Error (MSE), perceptual loss function and Total Variation (TV) regularization term, specifically: wherein: />To reconstruct the amplitude of an image, a _gt For the amplitude of the target image, +.>For the output of each layer of a pretrained VGG network +.>Representing the operation of calculating the total variation, phi is the calculated phase hologram. Alpha is the weight of the perceptual loss function and beta is the weight of the total variation regularization term. The weight of the perceptual loss function in this embodiment is set to 0.025 and the weight of the total variation regularization term is set to 0.001.

Fig. 2 is a schematic diagram of the principle of the present embodiment. The target image is input into a trained modified Vision Transformer network, and the network outputs a phase-only hologram corresponding to the target image. The hologram is loaded onto the SLM of a holographic display system and a reconstructed holographic display image can be captured with an industrial camera.

As shown in fig. 3, the holographic display system includes: the method comprises the steps of loading a pure phase hologram onto a phase spatial light modulator, modulating a planar light wave through the hologram, and then transmitting a diffraction pattern of 7cm, namely a reconstruction pattern, filtering high-order diffracted light through a 4f system, and capturing the reconstruction image on a back focal plane of the 4f system through an industrial camera.

As shown in fig. 4, an object display image, a phase-only hologram, and a hologram display image in the present embodiment are shown.

Through specific practical experiments, a network is built by adopting Python 3.8.0 and PyTorch 1.8.0 as basic environments, a DIV2K data set (3200 images in total) enhanced by horizontal and rotation is selected as an input training set, the resolution of the image is 1024 multiplied by 1024, the pixel size is set to be 3.74 mu m multiplied by 3.74 mu m, the wavelength of a laser source is set to be 543nm, and the propagation distance is set to be 7cm. The batch size used for training is 1, the initial learning rate is 0.001, an AdamW optimizer with angular momentum (0.9,0.999) is used for training the network, the training period is 50, and a cosine decay strategy is used for reducing the learning rate. The training device used was a NVIDIA GeForce RTX 3090GPU card.

TABLE 1

Method	GS	SGD	DPAC	U-Net	The invention is that
						PSNR(dB)	22.53	32.12	26.32	22.76	32.41
SSIM	0.653	0.945	0.895	0.739	0.946
						Time(s)	1.127	12.96	0.001	0.006	0.132

As shown in Table 1, compared with the prior art, the method works on fifty randomly selected test target images, and takes the peak image signal to noise ratio (PSNR) and Structural Similarity (SSIM) as indexes for measuring the display image quality. The simulated display images PSNR and SSIM are 32.41dB and 0.946 respectively, which are improved by 9.65dB and 0.207 respectively compared with the U-Net method (based on CNN). The approach to approximate PSNR and SSIM is compared to the SOTA iterative method SGD using 500 iterations, but the method increases 98 times the hologram generation time.

The foregoing embodiments may be partially modified in numerous ways by those skilled in the art without departing from the principles and spirit of the invention, the scope of which is defined in the claims and not by the foregoing embodiments, and all such implementations are within the scope of the invention.

Claims

1. A high quality hologram generating method based on a modified ViT network, characterized in that a target image is encoded into its corresponding hologram by constructing an encoding-decoding architecture and using a modified ViT as an encoding part; simulating free space propagation of light in the decoding part through an angular spectrum propagation algorithm to obtain a reconstructed image of the hologram, and performing iterative training on the encoding part of the encoding-decoding architecture by calculating a loss function between the reconstructed image and the target image; and reconstructing a high-quality holographic display image by using the pure phase hologram generated by the trained encoding-decoding architecture through a holographic display system in an online stage.

2. The improved ViT network-based high quality hologram generating method of claim 1 wherein said improved Vision Transformer network comprises: the U-shaped framework consists of four downsampling modules and corresponding upsampling modules, wherein: each downsampling module and the corresponding upsampling module comprise two global filtering blocks.

3. The method for generating a high quality hologram based on a modified ViT network as claimed in claim 2, wherein said global filtering block comprises: two layer normalization units, a global filtering layer and a local enhanced feed forward network (LeFF), wherein: the global filtering layer firstly converts the input spatial features into frequency domains through two-dimensional fast Fourier transform (2D FFT), filters the frequency domain features through a learnable global filter, and then converts the frequency domain feature map back into the spatial features through two-dimensional inverse Fourier transform (2D IFFT).

4. The improved ViT network-based high-image-quality holographic display method of claim 1, wherein said angular spectrum propagation method is:wherein: e, e ^iφ(x,y) For the complex amplitude distribution of the diffraction plane, +.>For the complex amplitude distribution of the image plane, f _x ,f _y Is the spatial frequency, λ is the wavelength, and z is the propagation distance.

5. The method for generating a high quality hologram based on an improved ViT network as claimed in claim 1, wherein said loss function comprises: mean Square Error (MSE), perceptual loss function and Total Variation (TV) regularization term, specifically: wherein: />To reconstruct the amplitude of an image, a _gt For the amplitude of the target image, +.>For the output of each layer of a pretrained VGG network +.>Representing the operation of computing the total variation, phi is the computed phase hologram, alpha is the weight of the perceptual loss function, and beta is the weight of the total variation regularization term.

6. The method for holographic display of high image quality based on ViT network of claim 1, wherein said optical display system comprises: the laser source, the beam expanding, the line deflection sheet, the semi-transparent and semi-reflective mirror configured with the SLM, the phase-only hologram, the first lens, the imaging aperture, the second lens and the industrial camera are sequentially arranged.