CN112308941A

CN112308941A - Restricted visual angle photoacoustic image reconstruction method based on mutual information

Info

Publication number: CN112308941A
Application number: CN202011215182.6A
Authority: CN
Inventors: 高飞; 张佳冬; 高峰
Original assignee: ShanghaiTech University
Current assignee: ShanghaiTech University
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2021-02-02
Anticipated expiration: 2040-11-04
Also published as: CN112308941B

Abstract

The invention discloses a limited visual angle photoacoustic image reconstruction method based on mutual information. The invention provides a method for obtaining images of an image domain and a k space by using time domain and frequency domain algorithms as the input of a post-processing network, a time-frequency domain U-shaped network model designed by the invention can well combine the two input organizational structure information and remove artifacts, meanwhile, mutual information constraint provides rich prior knowledge for the network, and experiments prove that the method is superior to the prior mature and novel reconstruction algorithm in the limited visual angle photoacoustic reconstruction problem.

Description

Restricted visual angle photoacoustic image reconstruction method based on mutual information

Technical Field

The invention relates to a limited visual angle photoacoustic image reconstruction method for time-frequency domain input based on mutual information priori knowledge compensation.

Background

Photoacoustic imaging is taken as a non-invasive medical imaging means, combines the high contrast characteristic of light and the deep penetration characteristic of sound, and has attracted wide attention of domestic and foreign scholars in recent years. Based on the photoacoustic effect, the biological tissue generates ultrasonic signals under the excitation of light, and after the ultrasonic signals are received by surrounding ultrasonic probes, people can obtain photoacoustic images of the biological tissue through a reconstruction algorithm.

As an important photoacoustic imaging apparatus, photoacoustic computed tomography (PAT) is being developed rapidly due to its fast imaging and other features. At present, photoacoustic images obtained by PAT have shown no advantages of other modalities in preclinical and clinical applications such as blood oxygen saturation quantification, small animal imaging, breast tumor benign and malignant judgment and the like. The accuracy of the diagnostic result depends strongly on the quality of the reconstructed photoacoustic image. This means that the photoacoustic reconstruction algorithm determines the value of the photoacoustic apparatus for clinical applications.

The quality of the photoacoustic image obtained by the conventional reconstruction algorithm is low at present. These reconstruction algorithms often introduce artifact information, which can make diagnosis by a physician difficult. Particularly in practical clinical applications, due to the limitations of space and detection environment, the ultrasound probe often covers only a portion of the patient, causing a problem of limited viewing angle. In this case, the use of the conventional reconstruction algorithm further causes the biological tissue information to be mixed in the artifact and not be resolved.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: under the condition of limited visual angle, the existing photoacoustic image reconstruction algorithm can cause information loss and introduce artifact information.

In order to solve the technical problem, the technical scheme of the invention is to provide a limited viewing angle photoacoustic image reconstruction method based on mutual information, which is characterized by comprising the following steps:

DAS image x obtained by using time domain reconstruction algorithm for ultrasonic signals received by ultrasonic probe_iWhile k-space image x obtained using a frequency domain reconstruction algorithm_k；

DAS image x_iAnd k-space image x_kAnd reconstructing the input time-domain U-shaped network model to obtain the photoacoustic image, wherein the time-domain U-shaped network model adopts an encoder-decoder structure:

DAS image x_iAfter being input into the encoder, the real part of the complex number is taken after fast Fourier transform, and then the real part and the complex number are comparedK-space image x input to an encoder_kInputting a first information sharing module after the channels are connected; k-space image x_kAfter being input into an encoder, the real part of a complex number is taken after fast inverse Fourier transform, and then the real part of the complex number is combined with a DAS image x of the input encoder_iInputting a first information sharing module after the channels are connected;

the time domain output I and the frequency domain output obtained by the information sharing module I are used as the time domain input and the frequency domain input of the information sharing module II after passing through the maximum pooling layer I; the time domain output II and the frequency domain output II obtained by the information sharing module II pass through the maximum pooling layer II and then are used as the time domain input and the frequency domain input of the information sharing module III; the time domain output III and the frequency domain output III obtained by the information sharing module III pass through the maximum pooling layer III and then are used as the time domain input and the frequency domain input of the information sharing module IV; after the time domain output IV and the frequency domain output IV of the information sharing module IV pass through the maximum pooling layer IV, the frequency domain output IV is connected with the time domain output IV on a channel after inverse Fourier transform, and then the intermediate potential characterization z is obtained through convolution kernel calculation₂Intermediate potential characterization z₂Is the final output of the encoder;

the first information sharing module, the second information sharing module, the third information sharing module and the fourth information sharing module perform the same processing on time domain input and frequency domain input, and the first information sharing module, the second information sharing module, the third information sharing module or the fourth information sharing module is defined as the information sharing module, and the following steps are performed:

the information sharing module extracts shallow features of time domain input and outputs the shallow features to a decoder;

the information sharing module interactively fuses the extracted high-level features of the time domain input, the extracted high-level features of the frequency domain input after being converted into the time domain and the original features of the time domain input to form a first time domain output, a second time domain output, a third time domain output or a fourth time domain output;

the information sharing module interactively fuses the extracted high-level features of the frequency domain input, the extracted high-level features of the time domain input after being converted into the frequency domain and the original features of the frequency domain input to form a first frequency domain output, a second frequency domain output, a third frequency domain output or a fourth frequency domain output;

the decoder characterizes the intermediate potential of the encoder output z₂And recovering as a photoacoustic image.

Preferably, the maximum pooling layer one, the maximum pooling layer two, the maximum pooling layer three, and the maximum pooling layer four are all maximum pooling layers of 2 × 2.

Preferably, the information sharing module directly outputs the time domain input to extract the shallow feature of the time domain input, and the output time domain input is connected to the corresponding decoder portion after being calculated by two convolution kernels of 3 × 3 to retain the shallow information of the time domain, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations.

Preferably, the time domain input is subjected to two convolution kernel calculations of 3 × 3 and two residual convolution layer calculations in the information sharing module to extract high-level features of the time domain input, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations;

the time domain input is subjected to fast Fourier transform in the information sharing module, and then is subjected to two 3 x 3 convolution kernel calculations and two residual convolution layer calculations to extract high-level characteristics of the time domain input transformed to a frequency domain, wherein batch normalization and ReLU activation operations are additionally added after each convolution operation;

the time domain input is not subjected to any computation within the information sharing module to preserve the original characteristics of the time domain input.

Preferably, the high-level features of the time domain input, the high-level features of the frequency domain input after being transformed into the time domain, and the original features of the time domain input are subjected to two 3 × 3 convolution kernel calculations after channel connection to obtain a first time domain output, a second time domain output, a third time domain output, or a fourth time domain output, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations.

Preferably, the frequency domain input is subjected to two convolution kernel calculations of 3 × 3 and two residual convolution layer calculations in the information sharing module to extract high-level features of the frequency domain input, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations;

the frequency domain input is subjected to inverse fast Fourier transform in the information sharing module, and then is subjected to two 3 x 3 convolution kernel calculations and two residual convolution layer calculations to extract high-level features of the frequency domain input after being transformed into a time domain, wherein batch normalization and ReLU activation operations are additionally added after each convolution operation;

the frequency domain input is not subjected to any computation within the information sharing module to preserve the original characteristics of the frequency domain input.

Preferably, the high-level features of the frequency domain input, the high-level features of the time domain input after being transformed into the frequency domain, and the original features of the frequency domain input are subjected to two 3 × 3 convolution kernel calculations after channel connection to obtain a first frequency domain output, a second frequency domain output, a third frequency domain output, or a fourth frequency domain output, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations.

Preferably, said intermediate potential characterisation z₂Processing in the decoder by:

step 1, intermediate potential characterization z₂The method comprises the steps of firstly performing upsampling convolution on a channel, then splicing the upsampling convolution with shallow features output by the information sharing module, then continuously performing two 3 x 3 convolution kernel calculations, and then performing upsampling convolution operation, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation;

step 2, splicing the shallow features of the output obtained in the step 1 and the output of the information sharing module III on a channel, then continuously calculating by two convolution kernels of 3 multiplied by 3, and then performing upsampling convolutional layer operation, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation;

step 3, splicing the shallow features of the output obtained in the step 2 and the output of the information sharing module II on a channel, then continuously calculating by two convolution kernels of 3 multiplied by 3, and then performing upsampling convolutional layer operation, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation;

and 4, splicing the shallow features output by the first information sharing module and the output obtained in the step 3 on a channel, then continuously calculating two convolution kernels with the size of 3 multiplied by 3, and then calculating the convolution kernels to obtain the final photoacoustic image, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation.

Preferably, the latent variables in the middle of the mutual information constraint are used to compensate the prior knowledge in the training phase of the time-frequency domain U-shaped network model.

Preferably, said compensating the a priori knowledge using the latent variables in between the mutual information constraints specifically comprises the steps of:

first step, the reconstruction objective function of the time-frequency domain U-shaped network model

As shown in the following formula (1):

in the formula (1), theta represents all trainable parameters in the time-frequency domain U-shaped network model; y represents a true photoacoustic image;

represents a mathematical expectation with respect to the parameter θ; f. of_θ() represents the time-frequency domain U-network model;

second, pre-training an auxiliary network to provide intermediate potential characterization z₁As mutual information constraints;

thirdly, after the mutual information constraint is added, in the training stage of the time-frequency domain U-shaped network model, the reconstruction objective function of the time-frequency domain U-shaped network model

Represented by the following formula (2):

in formula (2), I (·,) represents mutual information; λ represents a penalty coefficient of mutual information constraint;

thirdly, optimizing the objective function

Lower bound of variation of (c):

with p (z)₁|z₂) Representing the true conditional distribution, using the variational distribution q (z)₁|z₂) To approximate the substitution of p (z)₁|z₂) Then the mutual information item I (z)₁,z₂) Represented by the following formula (3):

in the formula (3), H (z)₁) Representing intermediate potential tokens z₁The entropy of the information of (1); h (z)₁|z₂) Representing a given intermediate potential representation z₂Potential characterization of the posterior middle z₁The conditional entropy of (a);

is expressed with respect to a parameter z₁，z₂A mathematical expectation of (d);

indicating a KL divergence;

obtaining the variation lower bound of the reconstruction objective function according to the formula (3)

As shown in the following formula (4):

fourthly, selecting Gaussian distribution as variation distribution q (z)₁|z₂) As shown in the following formula (5):

in the formula (5), μ (z)₂) Mean, median latent variable z representing a Gaussian distribution₂Calculated by 1 × 1 convolution layer;

σ(z₂) The variance representing the gaussian distribution is calculated using the following formula (6):

in the above formula (6), C, H, W represents z₁And z₂The number, height and width of the channels; ε is a positive constant;

representing a real number domain space of dimension C;

the fifth step, substituting the above formula (6) into the above formula (5) and taking the logarithm, then the variation distribution q (z)₁|z₂) Represented by the following formula (7):

in the formula (7), c is a constant;

the above equation (7) expresses the complex mutual information item which cannot be solved into the mutual information variation which is easy to solve and is realized by programming, and through the above equation (7), the intermediate latent variable of the time-frequency domain U-shaped network model can be learned from the encoder to more supervision information to achieve the effect of priori knowledge compensation.

The invention provides a method for obtaining images of an image domain and a k space by using time domain and frequency domain algorithms as the input of a post-processing network, a time-frequency domain U-shaped network model designed by the invention can well combine the two input organizational structure information and remove artifacts, meanwhile, mutual information constraint provides rich prior knowledge for the network, and experiments prove that the method is superior to the prior mature and novel reconstruction algorithm in the limited visual angle photoacoustic reconstruction problem.

Drawings

FIG. 1 is a general schematic diagram of DuDoUnnet;

FIG. 2 is a schematic diagram of an ISB structure;

FIG. 3 is a general diagram of a mutual information constraint framework;

fig. 4 shows the results of the simulation (three parts, marked and enlarged below the picture, where the difference in reconstruction is evident).

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

Aiming at the problem of photoacoustic reconstruction at a limited visual angle, the invention provides a photoacoustic image reconstruction method at the limited visual angle based on mutual information, which comprises the following steps:

DAS image x obtained by using time domain reconstruction algorithm for ultrasonic signals received by ultrasonic probe_iWhile k-space image x obtained using a frequency domain reconstruction algorithm_k. The two reconstruction algorithms can reconstruct the same biological tissue information, but introduce different artifact information, which can teach the post-processing network at the input level which are tissues and which are artifacts.

In order to further distinguish the two kinds of information, the invention designs a variant model of Unet, namely a time-frequency Domain U-shaped network (Dual Domain Unet, hereinafter, referred to as DuDoUnet model), and the whole structure of the Unet is shown in FIG. 1.

DAS image x obtained by DuDoUnet model using time domain reconstruction algorithm_iAnd k-space image x obtained by frequency domain reconstruction algorithm_kAs an input. In FIG. 1, FFT and IThe FFT represents the fast fourier transform and the inverse fast fourier transform, respectively, and Concat represents that the inputs are connected in the channel dimension. This allows the two inputs of image domain and k-space to share information with each other and to distinguish artifacts. The whole dudonet model is regarded as an encoder-decoder structure, wherein the specific structure and the operation flow of an encoder are as follows:

DAS image x_i Size 128X 1, k-space image x_k Size 128X 2, where two channels are respectively k-space image x_kReal and imaginary parts of (c). DAS image x_iAfter fast fourier transform, the real part of the complex number is taken, and the size of the complex number is 128 × 128 × 1. In sum k-space image x_kAfter the channels are connected, their size becomes 128 × 128 × 3. Likewise, k-space image x_kTaking the real part of the complex number after fast Fourier transform, the size of the real part is 128 multiplied by 1, and summing the DAS image x_iAfter the channels are connected, the size thereof becomes 128 × 128 × 2.

The time-frequency domain input then enters a first Information Sharing Block (ISB). The specific structure of ISB is shown in fig. 2.

The ISB has two inputs, a time domain input and a frequency domain input. The time domain input in ISB accomplishes two different tasks:

task 1: and (4) directly outputting. This part is used to extract shallow features in the time domain.

Task 2: the method is divided into three parts for time domain feature extraction and time domain to frequency domain feature conversion, and further comprises the following steps:

task 2.1: two 3 x 3 convolution kernel calculations and two residual convolution layer calculations are performed. Each convolution operation is additionally followed by batch normalization and ReLU activation operations. This part is used to extract high-level features in the time domain.

Task 2.2: and (4) directly outputting. This part is used to preserve the original characteristics of the time domain input layer.

Task 2.3: after fast fourier transform, two 3 x 3 convolution kernel calculations and two residual convolution layer calculations are performed. This part is used to extract the high-level features after the time domain is transformed into the frequency domain.

It is noted that although task 1 and task 2.2 are identical, the corresponding output operations are different and the degree of importance is different.

ISB only requires the frequency domain input to complete one task, namely task 3:

task 3: the method is divided into three parts for frequency domain feature extraction and frequency domain to time domain feature conversion, and further comprises the following steps:

task 3.1: two 3 x 3 convolution kernel calculations and two residual convolution layer calculations are performed. This part is used to extract the high-level features of the frequency domain.

Task 3.2: and (4) directly outputting. This part is used to preserve the original characteristics of the frequency domain input layer.

Task 3.3: after the inverse fast Fourier transform, the two convolution kernel calculations of 3 x 3 and the two residual convolution layer calculations are performed. This part is used to extract the high-level features after the frequency domain is transformed into the time domain.

The ISB has three outputs, namely a connection output, a time domain output, and a frequency domain output:

and (3) connection and output: the output in task 1 is subjected to two convolution kernel calculations of 3 × 3 and then connected to the corresponding decoder portion to retain the shallow information in the time domain.

And (3) time domain output: and connecting the outputs of the task 2.1, the task 2.2 and the task 3.3 on a channel, and outputting after calculating by two convolution kernels of 3 multiplied by 3, wherein the part ensures the interactive fusion of time domain information.

And (3) frequency domain output: and connecting the outputs of the task 3.1, the task 3.2 and the task 2.3 on a channel, and outputting after calculating by two convolution kernels of 3 multiplied by 3, wherein the part ensures the interactive fusion of frequency domain information.

The ISB is a general-purpose module in the DuDoUnet model, and can adapt to different input channel numbers and output the characteristics of the required channel number, wherein unique time-frequency domain transformation ensures that different tasks do not interfere with each other due to time-frequency domain characteristics while ensuring the sharing of photoacoustic image information of an image domain and a k space.

After the first ISB, the first ISB is connected to output one, and the size of the first time domain output and the first frequency domain output are both 128 x 12864, the time domain output I and the frequency domain output II pass through a maximum pooling layer of 2 × 2, and the sizes of the time domain output I and the frequency domain output II are both changed into 64 × 64 × 064. And then the two outputs are used as the time domain input and the frequency domain input of a second ISB, after the second ISB is calculated, a second output is connected, and the sizes of the second time domain output and the second frequency domain output are both 64 multiplied by 164 multiplied by 2128. The time domain output two and the frequency domain output two pass through a maximum pooling layer of 2 × 32, and the sizes of the time domain output two and the frequency domain output two are both changed into 32 × 432 × 5128. These two outputs are then used as the time domain input and frequency domain input of the third ISB, and after the third ISB is calculated, output three is connected, and the magnitudes of both the time domain output three and the frequency domain output three become 32 × 632 × 7256. The time domain output three and the frequency domain output three pass through a maximum pooling layer of 2 × 82, and the sizes of the time domain output three and the frequency domain output three are both changed into 16 × 916 × 256. And then the two outputs are used as the time domain input and the frequency domain input of a fourth ISB, after the fourth ISB is calculated, the fourth output is connected, and the sizes of the time domain output four and the frequency domain output four are both 16 multiplied by 016 multiplied by 512. The time domain output four and the frequency domain output four pass through a maximum pooling layer of 2 × 2, and the sizes of the time domain output four and the frequency domain output four are changed into 8 × 8 × 512. And the frequency domain output IV is subjected to inverse Fourier transform and then connected with the time domain input output IV on a channel, and the size of the frequency domain output IV is changed into 8 multiplied by 1024. After the output is calculated by two convolution kernels of 3 multiplied by 3, the intermediate potential characterization z is finally obtained₂I.e., the final output of the encoder, has a size of 8 × 8 × 512.

The specific structure and operation flow of the decoder are as follows:

intermediate potential characterization z₂The size of the upsampled convolutional layer of 2 × 2 becomes 16 × 16 × 0512, and the output is obtained after the connection output four in the fourth ISB is spliced on a channel, and the size of the output is 16 × 116 × 21024. The output continues through two convolution kernel calculations of 3 x 33 and becomes 16 x 416 x 5512 in size. The output is then subjected to a 2 x 62 upsampling convolutional layer operation, and the size becomes 32 x 732 x 8256. And the third output of the third ISB is spliced on the channel to obtain the output with the size of 32 × 932 × 512. The output continues through two convolution kernel calculations of 3 × 03, and the size becomes 32 × 32 × 256. The output then goes through a 2 x 2 upsampled convolutional layer operation, and the size becomes 64 x 128. And the second connection output in the second ISB is spliced on the channel to obtain an output with the size of 64 × 64 × 256. The output continues through two 3The convolution kernel calculation for × 3 becomes 64 × 64 × 0128 in size. The output is then subjected to a 2 × 12 upsampling convolutional layer operation, and the size becomes 128 × 2128 × 364. And the connection output in the first ISB-after splicing on the channel-results in an output of size 128 x 128. The output continues through two 3 x 3 convolution kernel calculations, becoming 128 x 64 in size. Finally, the output is subjected to a layer of 1 × 1 convolution kernel calculation to obtain a final output, and the size of the final output is equal to that of a real image (GT), namely 128 × 128 × 1.

We use the real photoacoustic image y as a surveillance tag and f_θ(. cndot.) denotes the DuDoUnet model, and θ denotes all parameters trainable in the DuDoUnet model. The reconstruction objective function of the dudonet as a whole can be expressed as:

in the experiments, we chose the root Mean Square Error (MSE) as a loss function for the reconstruction.

Considering that the limited visual angle photoacoustic image input can only learn a priori knowledge through a supervision tag, but as the network deepens, the gradient disappearance problem easily occurs, and meanwhile if a decoder of the network is powerful enough, the encoder cannot learn useful information, and at the moment, the reconstructed model becomes a generated model. In this regard, the present invention proposes to compensate the prior knowledge using the latent variables in the middle of the mutual information constraints at the stage of model training. The overall frame is shown in fig. 3.

In order to make the encoder of DuDoUnnet model learn rich a priori, an auxiliary network (selected from automatic encoder in experiment) is pre-trained to provide intermediate potential characterization z₁I.e. the output of the encoder in the automatic encoder, using mutual information constraints z₁And intermediate potential characterization z extracted in DuDoUnet model₂(output of encoder in dudonet model). Mutual information constraint ensures that the encoder of the DuDoUnnet model can learn more prior knowledge not only from the supervision tag but also through intermediate potential characterization, thereby achieving prior knowledgeThe effect of knowledge compensation.

After the mutual information constraint is added, in the model training phase, the overall objective function can be expressed as:

all parameters of the model can be obtained through the maximization formula (2). But it is very difficult to solve equation (2) directly. Therefore we only optimize the lower variation bound of equation (2). We use p (z)₁|z₂) Represents the true conditional distribution, but p (z)₁|z₂) Often unknown, so we use the variational distribution q (z)₁|z₂) To approximate the substitution of p (z)₁|z₂). The mutual information item can thus be expressed as:

the representation represents the KL divergence. Because of z₁Is an intermediate potential characterization provided by the pre-training network, so z is the training process₁Remaining unchanged, and the KL divergence being non-negative, we can remove these two terms to obtain the lower bound of the variation of the objective function as shown in equation (4):

we select the Gaussian distribution as the variational distribution q (z)₁|z₂)：

Wherein, mu (z)₂) And σ (z)₂) Representing the mean and variance of the gaussian distribution. Intermediate latent variable z₂By 1x1 convolution layer calculation, we can get the mean value μ (z) of the Gaussian distribution₂). But the variance σ (z) of the gaussian distribution is calculated using the same method₂) Will cause instability in the network training, so we use equation (6) to calculate the variance:

representing a real number domain space of dimension C.

After substituting equation (6) into equation (5) and taking the logarithm, the variation mutual information item can be expressed as:

in the formula (7), c is a constant.

Unlike MSE, equation (7) allows the network to impose different penalty constraints on different layers when the variance σ (z)₂) Equation (7) is equivalent to MSE at constant 1.

Equation (7) expresses the complex mutual information items that cannot be solved as easily solving the mutual information variation that the programming implements. By formula (7), the intermediate latent variables of the DuDoUnet model can be learned from the encoder to more supervisory information to achieve the effect of a priori knowledge compensation.

To demonstrate the effectiveness of the present invention, we performed experimental validation using an open data set. We selected a total of 2000 samples, 1500 of which were used for training and 500 were left for testing. The method uses a K-wave photoacoustic simulation tool box to generate photoacoustic signals, and x is obtained through a time domain reconstruction algorithm and a frequency domain reconstruction algorithm_iAnd x_k. Then we obtain the reconstructed photoacoustic image using the mutual information constrained DuDoUnet model. In this experiment, the model parameters λ and ε were both set to 1.

In order to further compare the effectiveness of the method and the model, common reconstruction models such as Unet, Ynet, DEUnet and the like are selected for comparison with the model. Where Unet is divided into Unet #1 and Unet # 2. Where Unet #1 only inputs x_i Unet #2 input x_iAnd x_kBut only one-way encoder is used. The Ynet and DEUnet models have a dual encoder structure, inputting x_iAnd x_kEach using a single encoder to extract information.

We compared the reconstruction results using Structural Similarity (SSIM) and peak signal-to-noise ratio (PSNR), the results are as follows:

under the condition of no mutual information constraint, the reconstruction result of the DuDoUnnet model exceeds that of other comparison models, and after the mutual information constraint is increased, the model obtains a better reconstruction result.

We further visually compare the reconstruction results with the reconstructed photoacoustic images, as shown in fig. 4. As can be seen from fig. 4, DuDoUnet and DuDoUnet + MI are more able to restore structural information of blood vessels under the same input conditions.

Claims

1. A limited visual angle photoacoustic image reconstruction method based on mutual information is characterized by comprising the following steps:

DAS image x_iAfter being input into the encoder, the real part of the complex number is taken after fast Fourier transform, and then the real part of the complex number is compared with a k space image x input into the encoder_kInputting a first information sharing module after the channels are connected; k-space image x_kAfter being input into an encoder, the real part of a complex number is taken after fast inverse Fourier transform, and then the real part of the complex number is combined with a DAS image x of the input encoder_iInputting a first information sharing module after the channels are connected;

2. The mutual information-based restricted viewing angle photoacoustic image reconstruction method of claim 1, wherein the maximum pooling layer one, the maximum pooling layer two, the maximum pooling layer three, and the maximum pooling layer four are all 2 x 2 maximum pooling layers.

3. The mutual information-based restricted viewing angle photoacoustic image reconstruction method of claim 1, wherein the information sharing module directly outputs the time domain input to extract the shallow features of the time domain input, and the output time domain input is calculated by two convolution kernels of 3 × 3 and then connected to the corresponding decoder portion to retain the shallow information of the time domain, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations.

4. The method for reconstructing the photoacoustic image with the limited viewing angle based on the mutual information as claimed in claim 1, wherein the high-level features of the time-domain input are obtained by performing two convolution kernel calculations of 3 × 3 and two residual convolution layer calculations on the time-domain input in the information sharing module, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations;

5. The mutual information-based restricted viewing angle photoacoustic image reconstruction method of claim 1, wherein the high-level features of the time domain input, the high-level features of the frequency domain input after being transformed into the time domain, and the original features of the time domain input are subjected to two 3 x 3 convolution kernel calculations after channel connection to obtain the first time domain output, the second time domain output, the third time domain output, or the fourth time domain output, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations.

6. The method as claimed in claim 1, wherein the frequency domain input is subjected to two convolution kernel calculations of 3 × 3 and two residual convolution layer calculations in the information sharing module to obtain the high-level features of the frequency domain input, and wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations;

7. The method as claimed in claim 1, wherein the high-level features of the frequency domain input, the high-level features of the time domain input after being transformed into the frequency domain, and the original features of the frequency domain input are subjected to two 3 × 3 convolution kernel calculations after being connected by channels to obtain the first frequency domain output, the second frequency domain output, the third frequency domain output, or the fourth frequency domain output, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations.

8. The mutual information-based restricted viewing angle photoacoustic image reconstruction method of claim 1, wherein the intermediate latent representation z₂Processing in the decoder by:

9. The mutual-information-based restricted viewing angle photoacoustic image reconstruction method of claim 1, wherein the latent variables in the middle of the mutual information constraint are used to compensate the prior knowledge in the training phase of the time-frequency domain U-shaped network model.

10. The mutual information-based restricted viewing angle photoacoustic image reconstruction method of claim 1, wherein the compensating the a priori knowledge using the latent variables in the middle of the mutual information constraint specifically comprises the steps of: