CN112308941B

CN112308941B - Mutual information-based limited view photoacoustic image reconstruction method

Info

Publication number: CN112308941B
Application number: CN202011215182.6A
Authority: CN
Inventors: 高飞; 张佳冬; 高峰
Original assignee: ShanghaiTech University
Current assignee: ShanghaiTech University
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2023-06-20
Anticipated expiration: 2040-11-04
Also published as: CN112308941A

Abstract

The invention discloses a limited visual angle photoacoustic image reconstruction method based on mutual information. The invention provides that the image domain and the k space image are obtained by using the time domain and the frequency domain algorithm and are used as the input of the post-processing network, the time-frequency domain U-shaped network model designed by the invention can well combine the organization structure information of the two inputs and remove the artifacts, meanwhile, the mutual information constraint provides rich priori knowledge for the network, and experiments prove that in the problem of photoacoustic reconstruction of limited visual angles, the method of the invention is superior to the current more mature and novel reconstruction algorithm.

Description

Mutual information-based limited view photoacoustic image reconstruction method

Technical Field

The invention relates to a limited visual angle photoacoustic image reconstruction method based on priori knowledge compensation of mutual information and used for time-frequency domain input.

Background

Photoacoustic imaging is a non-invasive medical imaging means, and combines the high contrast characteristic of light and the deep penetration characteristic of sound, and has recently attracted widespread attention from domestic and foreign students. Based on the photoacoustic effect, biological tissues generate ultrasonic signals under optical excitation, and after the ultrasonic signals are received by surrounding ultrasonic probes, people can obtain photoacoustic images of the biological tissues through a reconstruction algorithm.

As an important photoacoustic imaging apparatus, photoacoustic computed tomography (PAT) is rapidly developing due to its rapid imaging and the like. At present, the photoacoustic image obtained by PAT has no advantages in preclinical and clinical applications such as blood oxygen saturation quantification, small animal imaging, breast tumor benign and malignant judgment and the like. The accuracy of the diagnostic result depends largely on the quality of the reconstructed photoacoustic image. This means that the photoacoustic reconstruction algorithm determines the value of the photoacoustic apparatus in clinical applications.

The quality of the photoacoustic image obtained by the traditional reconstruction algorithm is low at present. These reconstruction algorithms often introduce artifact information, which makes diagnosis difficult for the physician. Particularly in practical clinical applications, due to space and detection environment limitations, the ultrasound probe often covers only a portion of the patient, resulting in limited viewing angles. The use of conventional reconstruction algorithms may further result in the biological tissue information becoming mixed in the artifact and indistinguishable.

Disclosure of Invention

The invention aims to solve the technical problems that: under the condition of limited viewing angles, the existing photoacoustic image reconstruction algorithm can cause information loss and introduce artifact information.

In order to solve the technical problems, the technical scheme of the invention provides a limited visual angle photoacoustic image reconstruction method based on mutual information, which is characterized by comprising the following steps:

DAS image x obtained by using time domain reconstruction algorithm for ultrasonic signals received by ultrasonic probe _i At the same time, k-space image x obtained using frequency domain reconstruction algorithm _k ；

Imaging DAS x _i And k-space image x _k And reconstructing after inputting the time-frequency domain U-shaped network model to obtain a photoacoustic image, wherein the time-frequency domain U-shaped network model adopts an encoder-decoder structure:

DAS image x _i The real part of complex number is obtained after fast Fourier transformation is carried out after the input encoder, and then the complex number is combined with k-space image x of the input encoder _k Inputting the first information sharing module after the channel connection; k-space image x _k The real part of complex number is obtained after fast Fourier transform is carried out after the input encoder, and then the real part is matched with DAS image x of the input encoder _i Inputting the first information sharing module after the channel connection;

the time domain output I and the frequency domain output I obtained by the information sharing module I are used as time domain input and frequency domain input of the information sharing module II after passing through the maximum pooling layer I; the time domain output II and the frequency domain output II obtained by the information sharing module II are used as time domain input and frequency domain input of the information sharing module III after passing through the maximum pooling layer II; the time domain output III and the frequency domain output III obtained by the information sharing module III are used as time domain input and frequency domain input of the information sharing module IV after passing through the maximum pooling layer III; after the time domain output four and the frequency domain output four of the information sharing module four pass through the maximum pooling layer four, the frequency domain output four is connected with the time domain output four on a channel after being subjected to inverse Fourier transformFrom this, an intermediate latent representation z is then obtained by convolution kernel calculation ₂ Intermediate latent representation z ₂ Is the final output of the encoder;

the first information sharing module, the second information sharing module, the third information sharing module and the fourth information sharing module perform the same processing on time domain input and frequency domain input, and the first information sharing module, the second information sharing module, the third information sharing module or the fourth information sharing module is defined as an information sharing module, and then the method comprises the following steps:

the information sharing module extracts shallow layer characteristics of time domain input and outputs the shallow layer characteristics to the decoder;

the information sharing module interactively fuses the extracted high-level features of the time domain input, the extracted high-level features of the frequency domain input transformed into the time domain and the original features of the time domain input to form time domain output I, time domain output II, time domain output III or time domain output IV;

the information sharing module interactively fuses the extracted high-level features of the frequency domain input, the extracted high-level features of the time domain input after being transformed into the frequency domain, and the original features of the frequency domain input to form frequency domain output I, frequency domain output II, frequency domain output III or frequency domain output IV;

the decoder will encode the intermediate latent representation z of the encoder output ₂ And restored to a photoacoustic image.

Preferably, the first max pooling layer, the second max pooling layer, the third max pooling layer and the fourth max pooling layer are all 2×2 max pooling layers.

Preferably, the information sharing module directly outputs the time domain input to extract the shallow features of the time domain input, and the output time domain input is connected to the corresponding decoder part after being calculated by two convolution kernels of 3×3, so as to preserve the shallow information of the time domain, wherein batch normalization and ReLU activation operations are additionally added after each convolution operation.

Preferably, the high-level characteristics of the time domain input are obtained by extracting the time domain input through two 3×3 convolution kernel calculations and two residual convolution layer calculations in the information sharing module, wherein each convolution operation is additionally added with batch normalization and ReLU activation operations;

the time domain input is subjected to fast Fourier transformation in the information sharing module, and then is subjected to two 3×3 convolution kernel calculations and two residual convolution layer calculations to obtain high-level characteristics after the time domain input is transformed into a frequency domain, wherein batch normalization and ReLU activation operations are additionally added after each convolution operation;

the time domain input is not subject to any computation within the information sharing module to preserve the original characteristics of the time domain input.

Preferably, the high-level feature of the time domain input, the high-level feature of the frequency domain input after being transformed into the time domain, and the original feature of the time domain input are obtained after two 3×3 convolution kernels are calculated after channel connection, and a batch normalization operation and a ReLU activation operation are additionally added after each convolution operation.

Preferably, the high-level characteristics of the frequency domain input are obtained by extracting the frequency domain input through two convolution kernel calculations of 3×3 and two residual convolution layer calculations in the information sharing module, wherein each convolution operation is additionally added with batch normalization and ReLU activation operations;

the frequency domain input is subjected to inverse fast Fourier transformation in the information sharing module, and then is subjected to two 3×3 convolution kernel calculations and two residual convolution layer calculations to obtain high-level characteristics after the frequency domain input is transformed into a time domain, wherein batch normalization and ReLU activation operations are additionally added after each convolution operation;

the frequency domain input is not subjected to any computation within the information sharing module to preserve the original characteristics of the frequency domain input.

Preferably, the high-level feature of the frequency domain input, the high-level feature of the time domain input after being transformed into the frequency domain, and the original feature of the frequency domain input are obtained after two 3×3 convolution kernels are calculated after being connected through a channel, and the frequency domain output one, the frequency domain output two, the frequency domain output three or the frequency domain output four are obtained, wherein each convolution operation is additionally added with batch normalization and ReLU activation operations.

Preferably, the intermediate latent representation z ₂ The following steps are performed in the decoder:

step 1, the intermediate latent representation z ₂ Firstly, splicing the up-sampled convolution layer with shallow layer characteristics output by the information sharing module four on a channel, then continuously performing convolution kernel calculation of two 3 multiplied by 3, and then performing up-sampled convolution layer operation, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation;

step 2, splicing the shallow features of the output obtained in the step 1 and the three outputs of the information sharing module on a channel, then continuing to perform convolution kernel calculation of two 3×3, and performing up-sampling convolution layer operation, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation;

step 3, splicing the shallow features of the output obtained in the step 2 and the output of the information sharing module II on a channel, then continuing to perform convolution kernel calculation of two 3 multiplied by 3, and performing up-sampling convolution layer operation, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation;

and 4, splicing the shallow features obtained in the step 3 and the first output of the information sharing module on a channel, then continuing to perform convolution kernel calculation of two 3 multiplied by 3, and performing convolution kernel calculation to obtain the final photoacoustic image, wherein batch normalization and ReLU activation operations are additionally added after each convolution operation.

Preferably, the potential variables in the middle of mutual information constraint are used to compensate a priori knowledge in the training phase of the time-frequency domain U-shaped network model.

Preferably, the compensation of the a priori knowledge using potential variables in between the mutual information constraints specifically comprises the steps of:

the first step, reconstructing objective function of the time-frequency domain U-shaped network model

The following formula (1):

in the formula (1), θ represents all the trainable parameters in the time-frequency domain U-shaped network model; y represents a real photoacoustic image;

representing mathematical expectations about the parameter θ; f (f) _θ (. Cndot.) represents the time-frequency domain U-shaped network model;

second, pre-training an auxiliary network to provide intermediate potential representation z ₁ As mutual information constraints;

thirdly, after mutual information constraint is added, in the training stage of the time-frequency domain U-shaped network model, the reconstruction objective function of the time-frequency domain U-shaped network model

Represented by the following formula (2):

in the formula (2), I (·, ·) represents mutual information; λ represents a penalty coefficient for mutual information constraint;

third step, optimizing objective function

Lower variation bound of (2):

by p (z) ₁ |z ₂ ) Representing the true conditional distribution, using the variation distribution q (z ₁ |z ₂ ) To approximate substitution of p (z) ₁ |z ₂ ) Then mutual information item I (z ₁ ,z ₂ ) Represented by the following formula (3):

in the formula (3), H (z) ₁ ) Representing intermediate latent representation z ₁ Is an information entropy of (a); h (z) ₁ |z ₂ ) Representing a given intermediate potential representation z ₂ Post intermediate latent characterization z ₁ Conditional entropy of (2);

representing the parameter z ₁ ，z ₂ Is a mathematical expectation of (a); />

The representation represents KL divergence;

obtaining a lower variation bound of the reconstruction objective function according to the above (3)

The following formula (4):

fourth, gaussian distribution is selected as variation distribution q (z ₁ |z ₂ ) The following formula (5) shows:

in formula (5), μ (z ₂ ) Mean value of Gaussian distribution, intermediate latent variable z ₂ Calculated by a 1 multiplied by 1 convolution layer;

σ(z ₂ ) Representing the variance of the gaussian distribution, calculated using the following equation (6):

in the above formula (6), C, H, W each represents z ₁ And z ₂ Channel number, height and width; epsilon is a positive constant;

representing a real number domain space with dimension C;

fifthly, substituting the formula (6) into the formula (5) and taking the logarithm, and then changing the distribution q (z) ₁ |z ₂ ) Represented by the following formula (7):

in formula (7), c is a constant;

the complex mutual information item which cannot be solved is expressed as the mutual information variable which is easy to solve and realize by the formula (7), and the intermediate latent variable of the time-frequency domain U-shaped network model can learn more supervision information from the encoder to achieve the effect of priori knowledge compensation through the formula (7).

The invention provides that the image domain and the k space image are obtained by using the time domain and the frequency domain algorithm and are used as the input of the post-processing network, the time-frequency domain U-shaped network model designed by the invention can well combine the organization structure information of the two inputs and remove the artifacts, meanwhile, the mutual information constraint provides rich priori knowledge for the network, and experiments prove that in the problem of photoacoustic reconstruction of limited visual angles, the method of the invention is superior to the current more mature and novel reconstruction algorithm.

Drawings

FIG. 1 is an overall schematic diagram of DuDoUnet;

FIG. 2 is a schematic diagram of the structure of an ISB;

FIG. 3 is an overall schematic diagram of a mutual information constraint framework;

fig. 4 shows the results of a simulation experiment (three parts of which the reconstruction differences are evident, marked and enlarged under the picture).

Detailed Description

The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.

Aiming at the problem of photoacoustic reconstruction of a limited viewing angle, the invention provides a method for reconstructing a photoacoustic image of the limited viewing angle based on mutual information, which comprises the following steps:

DAS image x obtained by using time domain reconstruction algorithm for ultrasonic signals received by ultrasonic probe _i At the same time, k-space image x obtained using frequency domain reconstruction algorithm _k . The two reconstruction algorithms can reconstruct the same biological tissue information, but introduce different artifact information, which can teach the post-processing network at the input level which are tissue which are artifacts.

In order to further distinguish the two kinds of information, the invention designs a variant model of Unet, namely a time-frequency Domain U-shaped network (hereinafter referred to as DuDoUnet model), and the whole structure is shown in figure 1.

DAS image x obtained by DuDoUnet model through time domain reconstruction algorithm _i And k-space image x obtained by frequency domain reconstruction algorithm _k As input. In fig. 1, FFT and IFFT represent fast fourier transform and inverse fast fourier transform, respectively, and Concat represents that inputs are connected in the channel dimension. This allows two inputs of image domain and k-space to share information and distinguish artifacts from each other. The entire DuDoUnet model is considered as an encoder-decoder structure, where the encoder architecture and operational flow is as follows:

DAS image x _i The size is 128 x1, k-space image x _k The size is 128 x 2, where the two channels are k-space images x, respectively _k Real and imaginary parts of (a) are provided. DAS image x _i The real part of the complex number is taken after the fast fourier transform and has a size of 128×128×1. Image x in sum k-space _k After the channel connection, its size becomes 128×128×3. Also, k-space image x _k Taking the real part of complex number after fast Fourier transform, the size of the real part is 128×128×1, and the sum DAS image x _i After the channel connection, its size becomes 128×128×2.

The time-frequency domain input then enters a first information sharing module (Information Sharing Block, hereinafter abbreviated ISB). The specific structure of the ISB is shown in fig. 2.

The ISB has two inputs, a time domain input and a frequency domain input, respectively. Time domain input in ISB accomplishes two different tasks:

task 1: and directly outputting. This part is used to extract shallow features in the time domain.

Task 2: is divided into three parts for time domain feature extraction and conversion from time domain to frequency domain features, and further comprises:

task 2.1: two 3 x 3 convolution kernel calculations and two residual convolution layer calculations are performed. Batch normalization and ReLU activation operations are additionally added after each convolution operation. This part is used to extract high-level features of the time domain.

Task 2.2: and directly outputting. This section is used to preserve the original features of the temporal input layer.

Task 2.3: after fast fourier transformation, two 3×3 convolution kernel calculations and two residual convolution layer calculations are performed. This part is used to extract the high-level features after the time domain transformation to the frequency domain.

It is noted that although task 1 and task 2.2 are identical, the corresponding output operations are different and play a different role.

ISB only requires frequency domain input to complete one task, namely task 3:

task 3: is divided into three parts for frequency domain feature extraction and frequency domain to time domain feature conversion, and further comprises:

task 3.1: two 3 x 3 convolution kernel calculations and two residual convolution layer calculations are performed. This part is used to extract the high-level features of the frequency domain.

Task 3.2: and directly outputting. This section is used to preserve the original features of the frequency domain input layer.

Task 3.3: after inverse fast fourier transformation, two convolution kernels of 3×3 and two residual convolution layers are calculated. This part is used to extract high-level features after the frequency domain is transformed into the time domain.

ISB has three outputs, namely a connection output, a time domain output, and a frequency domain output:

and (3) connection output: the output in task 1 is computed by two 3 x 3 convolution kernels and then connected to the corresponding decoder portion to preserve the shallow information in the time domain.

Time domain output: the outputs of the tasks 2.1, 2.2 and 3.3 are connected on the channel and output after being calculated by two convolution kernels of 3 multiplied by 3, and the interactive fusion of the time domain information is ensured.

Frequency domain output: the outputs of the task 3.1, the task 3.2 and the task 2.3 are connected on a channel and are output after being calculated by two convolution kernels of 3 multiplied by 3, and the interaction fusion of the frequency domain information is ensured.

ISB is a universal module in the DuDoUnet model, and can adapt to different input channel numbers and output the characteristics of the required channel numbers, wherein the unique time-frequency domain transformation ensures that the photoacoustic image information sharing of an image domain and k space is ensured, and meanwhile, the mutual interference between different tasks due to the time-frequency domain characteristics is also ensured.

After passing through the first ISB, the output one is connected, the sizes of the time domain output one and the frequency domain output one are 128×128×64, and the sizes of the time domain output one and the frequency domain output two are 64×164×264 after passing through the maximum pooling layer of 2×02. Then these two outputs are used as the time domain input and the frequency domain input of the second ISB, and after the computation of the second ISB, the two outputs are connected, and the two outputs of the time domain output and the two outputs of the frequency domain output are both 64×364×4128. The time domain output two and the frequency domain output two pass through a 2×52 maximum pooling layer, and the sizes become 32×632×7128. Then, the two outputs are used as a time domain input and a frequency domain input of a third ISB, and after the third ISB calculation, the three outputs are connected, and the sizes of the three outputs become 32×832×9256. The time domain output three and the frequency domain output three pass through a 2×2 maximum pooling layer, and the sizes become 16×16×256. Then, the two outputs are used as a time domain input and a frequency domain input of a fourth ISB, and after the fourth ISB calculation, the four outputs are connected, and the sizes of the four outputs of the time domain and the four outputs of the frequency domain are changed into 16 multiplied by 512. The time domain output four and the frequency domain output four pass through a 2×2 maximum pooling layer, and the sizes become 8×8×512. Frequency domain output fourThe size of the channel is changed into 8 multiplied by 1024 after the inverse Fourier transform and the time domain input and output are connected together on the channel. After the output is calculated by two convolution kernels of 3×3, the intermediate potential representation z is finally obtained ₂ I.e. the final output of the encoder, which is 8 x 512 in size.

The decoder has the following specific structure and operation flow:

intermediate latent representation z ₂ The size of the up-sampled convolution layer of 2×2 becomes 16×16×0512, and the connection output four in the fourth ISB is spliced on the channel to obtain an output, and the size of the output is 16×116×21024. The output continues through two 3 x 33 convolution kernel calculations, becoming 16 x 416 x 5512 in size. The output then goes through a 2 x 62 up-sampled convolutional layer operation to become 32 x 732 x 8256 in size. And the third connected output in the third ISB is spliced on the channel to obtain an output with the size of 32 multiplied by 932 multiplied by 512. The output continues through two convolution kernel calculations of 3 x 03 to become 32 x 132 x 2256 in size. The output then goes through a 2 x 32 up-sampled convolutional layer operation to become 64 x 464 x 5128 in size. And the second connection output in the second ISB is spliced on the channel to obtain an output with the size of 64 multiplied by 664 multiplied by 7256. The output continues through two 3 x 83 convolution kernel calculations, becoming 64 x 964 x 128 in size. The output then goes through a 2 x 02 up-sampled convolutional layer operation to become 128 x 1128 x 264 in size. And the connection output in the first ISB-after splicing on the channel-yields an output of size 128 x 3128 x 4128. The output continues through two 3 x 3 convolution kernel calculations, becoming 128 x 64 in size. Finally, the output is calculated by a layer of convolution kernel of 1×1, resulting in a final output of the same size as the real image (GT), 128×128×1.

We use the real photoacoustic image y as a supervision tag, using f _θ (. Cndot.) represents the DuDoUnet model, θ represents all parameters trainable in the DuDoUnet model. The reconstructed objective function of the DuDoUnet whole can be expressed as:

in experiments, we selected root Mean Square Error (MSE) as the reconstructed loss function.

Considering that the limited view photoacoustic image input can only learn a priori knowledge through the supervision labels, the problem of gradient disappearance easily occurs as the network deepens, and meanwhile, if the decoder of the network is strong enough, the encoder cannot learn useful information, and at the moment, the reconstruction model also becomes a generation model. In this regard, the present invention proposes to use potential variables in the middle of mutual information constraints to compensate for a priori knowledge during the phase of model training. The overall frame is shown in fig. 3.

In order for the encoder of the DuDoUnet model to learn rich a priori knowledge, a secondary network (selected from the auto-encoders in the experiment) is pre-trained to provide an intermediate potential representation z ₁ I.e. the output of the encoder in an automatic encoder, using mutual information to constrain z ₁ And an intermediate latent representation z extracted from the DuDoUnet model ₂ (output of encoder in DuDoUnet model). Mutual information constraint ensures that the encoder of the DuDoUnet model can learn priori knowledge from the supervision labels, can learn more priori knowledge through intermediate potential characterization, and achieves the effect of priori knowledge compensation.

After mutual information constraint is added, in the model training stage, the overall objective function can be expressed as:

all parameters of the model can be obtained by maximizing equation (2). However, it is very difficult to directly solve equation (2). We therefore only optimize the lower variation bound of equation (2). We use p (z) ₁ |z ₂ ) Representing the true conditional distribution, but p (z ₁ |z ₂ ) Often unknown, we use the variation distribution q (z ₁ |z ₂ ) To approximate substitution of p (z) ₁ |z ₂ ). The mutual information item can thus be expressed as:

The representation represents the KL divergence. Because of z ₁ Is an intermediate potential representation provided by the pre-training network, so z during training ₁ The KL divergence is non-negative, so we can remove both terms, and get the lower variation bound of the objective function as shown in equation (4):

we choose gaussian as the variational distribution q (z ₁ |z ₂ )：

Wherein μ (z ₂ ) Sum sigma (z) ₂ ) Representing the mean and variance of the gaussian distribution. Intermediate latent variable z ₂ By 1x1 convolution layer calculation we can get the mean μ (z ₂ ). The variance σ (z) of the gaussian distribution is calculated using the same method ₂ ) Will cause instability in the network training, so we use equation (6) to calculate the variance:

representing the real number domain space of dimension C.

After substituting formula (6) into formula (5) and taking the logarithm, the variational mutual information item can be expressed as:

in the formula (7), c is a constant.

Unlike MSE, equation (7) allows the network to impose different penalty constraints on different layers when the variance σ (z) ₂ ) At a constant equal to 1, equation (7) and MSE are equivalent.

Equation (7) expresses complex mutual information terms that cannot be solved as a mutual information variation term that is easy to solve for programming implementation. Through formula (7), intermediate latent variables of the DuDoUnet model can learn more supervision information from the encoder to achieve the effect of priori knowledge compensation.

To illustrate the effectiveness of the invention, we used a public data set to perform experimental verification. We have picked a total of 2000 samples, 1500 of which are used for training, and the remaining 500 are used for testing. We use K-wave photoacoustic simulation toolbox to generate photoacoustic signals, and obtain x through time domain reconstruction algorithm and frequency domain reconstruction algorithm _i And x _k . We then obtain a reconstructed photoacoustic image using the mutual information constrained DuDoUnet model. In this experiment, the model parameters λ and ε were each set to 1.

In order to further compare the effectiveness of the method and the model, we choose the commonly used reconstruction models such as the Unet, the Ynet, the DEUnet and the like and compare with the models. Wherein the Unet is divided into Unet#1 and Unet#2. Wherein Unet #1 inputs only x _i Unet #2 input x _i And x _k But only one encoder is used. The Ynet and DEUnet models have a dual encoder structure, input x _i And x _k Each using a single encoder to extract the information.

We use the Structural Similarity (SSIM) and peak signal to noise ratio (PSNR) to pair weight results as follows:

under the condition that mutual information constraint is not available, the reconstruction result of the DuDoUnet model exceeds that of other comparison models, and after the mutual information constraint is added, the model obtains a better reconstruction result.

We further visually compare the reconstructed results by reconstructed photoacoustic images as shown in fig. 4. As can be seen from fig. 4, under the same input conditions, duDoUnet and dudounet+mi are more capable of restoring structural information of blood vessels.

Claims

1. The method for reconstructing the limited-view photoacoustic image based on the mutual information is characterized by comprising the following steps of:

obtained through the first information sharing moduleThe time domain output I and the frequency domain output I of the information sharing module II are used as time domain input and frequency domain input of the information sharing module II after passing through the maximum pooling layer I; the time domain output II and the frequency domain output II obtained by the information sharing module II are used as time domain input and frequency domain input of the information sharing module III after passing through the maximum pooling layer II; the time domain output III and the frequency domain output III obtained by the information sharing module III are used as time domain input and frequency domain input of the information sharing module IV after passing through the maximum pooling layer III; after the time domain output four and the frequency domain output four of the information sharing module four pass through the maximum pooling layer four, the frequency domain output four is connected with the time domain output four on a channel after being subjected to inverse Fourier transform, and then the intermediate potential representation z is obtained through convolution kernel calculation ₂ Intermediate latent representation z ₂ Is the final output of the encoder;

2. The method for reconstructing a limited view photoacoustic image based on mutual information according to claim 1, wherein the first maximum pooling layer, the second maximum pooling layer, the third maximum pooling layer and the fourth maximum pooling layer are all 2 x 2 maximum pooling layers.

3. The method of claim 1, wherein the information sharing module directly outputs the time domain input to extract the shallow features of the time domain input, and the output time domain input is connected to the corresponding decoder after two convolution kernels of 3×3 to preserve the shallow information of the time domain, wherein each convolution operation is additionally followed by a batch normalization and a ReLU activation operation.

4. The method for reconstructing a limited view photoacoustic image based on mutual information according to claim 1, wherein the time domain input is extracted from the information sharing module through two convolution kernel calculations of 3×3 and two residual convolution layer calculations to obtain high-level features of the time domain input, wherein each convolution operation is additionally added with batch normalization and ReLU activation operations;

5. The method for reconstructing a limited view photoacoustic image based on mutual information according to claim 1, wherein the high-level features of the time domain input, the high-level features of the frequency domain input after being transformed into the time domain, and the original features of the time domain input are obtained by performing two 3×3 convolution kernel calculations after channel connection, so as to obtain the first time domain output, the second time domain output, the third time domain output or the fourth time domain output, wherein each convolution operation is additionally added with batch normalization and ReLU activation operations.

6. The method for reconstructing a limited view photoacoustic image based on mutual information according to claim 1, wherein the frequency domain input is extracted from the information sharing module through two convolution kernel calculations of 3×3 and two residual convolution layer calculations to obtain a high-level characteristic of the frequency domain input, wherein each convolution operation is additionally added with batch normalization and ReLU activation operations;

7. The method for reconstructing a limited view photoacoustic image based on mutual information according to claim 1, wherein the high-level features of the frequency domain input, the high-level features of the time domain input after being transformed into the frequency domain, and the original features of the frequency domain input are obtained by performing two 3×3 convolution kernel calculations after channel connection, to obtain the first frequency domain output, the second frequency domain output, the third frequency domain output or the fourth frequency domain output, wherein each convolution operation is additionally supplemented with batch normalization and ReLU activation operations.

8. A limited view photoacoustic image reconstruction method according to claim 1, wherein the intermediate potential representation z ₂ The following steps are performed in the decoder:

step 1, the intermediate latent representation z ₂ The method comprises the steps of firstly performing up-sampling convolution layer, then splicing shallow layer features output by an information sharing module on a channel, then continuously performing two convolution kernel calculations of 3 multiplied by 3, and then performing up-sampling convolution layer operation, wherein each convolutionBatch normalization and ReLU activation operations are added after the operation;

9. A limited view photoacoustic image reconstruction method according to claim 1, wherein potential variations in the middle of mutual information constraints are used to compensate a priori knowledge during the training phase of the time-frequency domain U-shaped network model.

10. A limited view photoacoustic image reconstruction method according to claim 1, wherein the compensating a priori knowledge using potential variables in between the mutual information constraints comprises the steps of: