CN112308941A - Restricted visual angle photoacoustic image reconstruction method based on mutual information - Google Patents

Restricted visual angle photoacoustic image reconstruction method based on mutual information Download PDF

Info

Publication number
CN112308941A
CN112308941A CN202011215182.6A CN202011215182A CN112308941A CN 112308941 A CN112308941 A CN 112308941A CN 202011215182 A CN202011215182 A CN 202011215182A CN 112308941 A CN112308941 A CN 112308941A
Authority
CN
China
Prior art keywords
frequency domain
output
time domain
sharing module
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011215182.6A
Other languages
Chinese (zh)
Other versions
CN112308941B (en
Inventor
高飞
张佳冬
高峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ShanghaiTech University
Original Assignee
ShanghaiTech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ShanghaiTech University filed Critical ShanghaiTech University
Priority to CN202011215182.6A priority Critical patent/CN112308941B/en
Publication of CN112308941A publication Critical patent/CN112308941A/en
Application granted granted Critical
Publication of CN112308941B publication Critical patent/CN112308941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06T5/80
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10132Ultrasound image

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a limited visual angle photoacoustic image reconstruction method based on mutual information. The invention provides a method for obtaining images of an image domain and a k space by using time domain and frequency domain algorithms as the input of a post-processing network, a time-frequency domain U-shaped network model designed by the invention can well combine the two input organizational structure information and remove artifacts, meanwhile, mutual information constraint provides rich prior knowledge for the network, and experiments prove that the method is superior to the prior mature and novel reconstruction algorithm in the limited visual angle photoacoustic reconstruction problem.

Description

Restricted visual angle photoacoustic image reconstruction method based on mutual information
Technical Field
The invention relates to a limited visual angle photoacoustic image reconstruction method for time-frequency domain input based on mutual information priori knowledge compensation.
Background
Photoacoustic imaging is taken as a non-invasive medical imaging means, combines the high contrast characteristic of light and the deep penetration characteristic of sound, and has attracted wide attention of domestic and foreign scholars in recent years. Based on the photoacoustic effect, the biological tissue generates ultrasonic signals under the excitation of light, and after the ultrasonic signals are received by surrounding ultrasonic probes, people can obtain photoacoustic images of the biological tissue through a reconstruction algorithm.
As an important photoacoustic imaging apparatus, photoacoustic computed tomography (PAT) is being developed rapidly due to its fast imaging and other features. At present, photoacoustic images obtained by PAT have shown no advantages of other modalities in preclinical and clinical applications such as blood oxygen saturation quantification, small animal imaging, breast tumor benign and malignant judgment and the like. The accuracy of the diagnostic result depends strongly on the quality of the reconstructed photoacoustic image. This means that the photoacoustic reconstruction algorithm determines the value of the photoacoustic apparatus for clinical applications.
The quality of the photoacoustic image obtained by the conventional reconstruction algorithm is low at present. These reconstruction algorithms often introduce artifact information, which can make diagnosis by a physician difficult. Particularly in practical clinical applications, due to the limitations of space and detection environment, the ultrasound probe often covers only a portion of the patient, causing a problem of limited viewing angle. In this case, the use of the conventional reconstruction algorithm further causes the biological tissue information to be mixed in the artifact and not be resolved.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: under the condition of limited visual angle, the existing photoacoustic image reconstruction algorithm can cause information loss and introduce artifact information.
In order to solve the technical problem, the technical scheme of the invention is to provide a limited viewing angle photoacoustic image reconstruction method based on mutual information, which is characterized by comprising the following steps:
DAS image x obtained by using time domain reconstruction algorithm for ultrasonic signals received by ultrasonic probeiWhile k-space image x obtained using a frequency domain reconstruction algorithmk
DAS image xiAnd k-space image xkAnd reconstructing the input time-domain U-shaped network model to obtain the photoacoustic image, wherein the time-domain U-shaped network model adopts an encoder-decoder structure:
DAS image xiAfter being input into the encoder, the real part of the complex number is taken after fast Fourier transform, and then the real part and the complex number are comparedK-space image x input to an encoderkInputting a first information sharing module after the channels are connected; k-space image xkAfter being input into an encoder, the real part of a complex number is taken after fast inverse Fourier transform, and then the real part of the complex number is combined with a DAS image x of the input encoderiInputting a first information sharing module after the channels are connected;
the time domain output I and the frequency domain output obtained by the information sharing module I are used as the time domain input and the frequency domain input of the information sharing module II after passing through the maximum pooling layer I; the time domain output II and the frequency domain output II obtained by the information sharing module II pass through the maximum pooling layer II and then are used as the time domain input and the frequency domain input of the information sharing module III; the time domain output III and the frequency domain output III obtained by the information sharing module III pass through the maximum pooling layer III and then are used as the time domain input and the frequency domain input of the information sharing module IV; after the time domain output IV and the frequency domain output IV of the information sharing module IV pass through the maximum pooling layer IV, the frequency domain output IV is connected with the time domain output IV on a channel after inverse Fourier transform, and then the intermediate potential characterization z is obtained through convolution kernel calculation2Intermediate potential characterization z2Is the final output of the encoder;
the first information sharing module, the second information sharing module, the third information sharing module and the fourth information sharing module perform the same processing on time domain input and frequency domain input, and the first information sharing module, the second information sharing module, the third information sharing module or the fourth information sharing module is defined as the information sharing module, and the following steps are performed:
the information sharing module extracts shallow features of time domain input and outputs the shallow features to a decoder;
the information sharing module interactively fuses the extracted high-level features of the time domain input, the extracted high-level features of the frequency domain input after being converted into the time domain and the original features of the time domain input to form a first time domain output, a second time domain output, a third time domain output or a fourth time domain output;
the information sharing module interactively fuses the extracted high-level features of the frequency domain input, the extracted high-level features of the time domain input after being converted into the frequency domain and the original features of the frequency domain input to form a first frequency domain output, a second frequency domain output, a third frequency domain output or a fourth frequency domain output;
the decoder characterizes the intermediate potential of the encoder output z2And recovering as a photoacoustic image.
Preferably, the maximum pooling layer one, the maximum pooling layer two, the maximum pooling layer three, and the maximum pooling layer four are all maximum pooling layers of 2 × 2.
Preferably, the information sharing module directly outputs the time domain input to extract the shallow feature of the time domain input, and the output time domain input is connected to the corresponding decoder portion after being calculated by two convolution kernels of 3 × 3 to retain the shallow information of the time domain, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations.
Preferably, the time domain input is subjected to two convolution kernel calculations of 3 × 3 and two residual convolution layer calculations in the information sharing module to extract high-level features of the time domain input, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations;
the time domain input is subjected to fast Fourier transform in the information sharing module, and then is subjected to two 3 x 3 convolution kernel calculations and two residual convolution layer calculations to extract high-level characteristics of the time domain input transformed to a frequency domain, wherein batch normalization and ReLU activation operations are additionally added after each convolution operation;
the time domain input is not subjected to any computation within the information sharing module to preserve the original characteristics of the time domain input.
Preferably, the high-level features of the time domain input, the high-level features of the frequency domain input after being transformed into the time domain, and the original features of the time domain input are subjected to two 3 × 3 convolution kernel calculations after channel connection to obtain a first time domain output, a second time domain output, a third time domain output, or a fourth time domain output, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations.
Preferably, the frequency domain input is subjected to two convolution kernel calculations of 3 × 3 and two residual convolution layer calculations in the information sharing module to extract high-level features of the frequency domain input, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations;
the frequency domain input is subjected to inverse fast Fourier transform in the information sharing module, and then is subjected to two 3 x 3 convolution kernel calculations and two residual convolution layer calculations to extract high-level features of the frequency domain input after being transformed into a time domain, wherein batch normalization and ReLU activation operations are additionally added after each convolution operation;
the frequency domain input is not subjected to any computation within the information sharing module to preserve the original characteristics of the frequency domain input.
Preferably, the high-level features of the frequency domain input, the high-level features of the time domain input after being transformed into the frequency domain, and the original features of the frequency domain input are subjected to two 3 × 3 convolution kernel calculations after channel connection to obtain a first frequency domain output, a second frequency domain output, a third frequency domain output, or a fourth frequency domain output, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations.
Preferably, said intermediate potential characterisation z2Processing in the decoder by:
step 1, intermediate potential characterization z2The method comprises the steps of firstly performing upsampling convolution on a channel, then splicing the upsampling convolution with shallow features output by the information sharing module, then continuously performing two 3 x 3 convolution kernel calculations, and then performing upsampling convolution operation, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation;
step 2, splicing the shallow features of the output obtained in the step 1 and the output of the information sharing module III on a channel, then continuously calculating by two convolution kernels of 3 multiplied by 3, and then performing upsampling convolutional layer operation, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation;
step 3, splicing the shallow features of the output obtained in the step 2 and the output of the information sharing module II on a channel, then continuously calculating by two convolution kernels of 3 multiplied by 3, and then performing upsampling convolutional layer operation, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation;
and 4, splicing the shallow features output by the first information sharing module and the output obtained in the step 3 on a channel, then continuously calculating two convolution kernels with the size of 3 multiplied by 3, and then calculating the convolution kernels to obtain the final photoacoustic image, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation.
Preferably, the latent variables in the middle of the mutual information constraint are used to compensate the prior knowledge in the training phase of the time-frequency domain U-shaped network model.
Preferably, said compensating the a priori knowledge using the latent variables in between the mutual information constraints specifically comprises the steps of:
first step, the reconstruction objective function of the time-frequency domain U-shaped network model
Figure BDA0002760120970000041
As shown in the following formula (1):
Figure BDA0002760120970000042
in the formula (1), theta represents all trainable parameters in the time-frequency domain U-shaped network model; y represents a true photoacoustic image;
Figure BDA0002760120970000043
represents a mathematical expectation with respect to the parameter θ; f. ofθ() represents the time-frequency domain U-network model;
second, pre-training an auxiliary network to provide intermediate potential characterization z1As mutual information constraints;
thirdly, after the mutual information constraint is added, in the training stage of the time-frequency domain U-shaped network model, the reconstruction objective function of the time-frequency domain U-shaped network model
Figure BDA0002760120970000044
Represented by the following formula (2):
Figure BDA0002760120970000045
in formula (2), I (·,) represents mutual information; λ represents a penalty coefficient of mutual information constraint;
Figure BDA0002760120970000046
thirdly, optimizing the objective function
Figure BDA0002760120970000047
Lower bound of variation of (c):
with p (z)1|z2) Representing the true conditional distribution, using the variational distribution q (z)1|z2) To approximate the substitution of p (z)1|z2) Then the mutual information item I (z)1,z2) Represented by the following formula (3):
Figure BDA0002760120970000051
in the formula (3), H (z)1) Representing intermediate potential tokens z1The entropy of the information of (1); h (z)1|z2) Representing a given intermediate potential representation z2Potential characterization of the posterior middle z1The conditional entropy of (a);
Figure BDA0002760120970000052
is expressed with respect to a parameter z1,z2A mathematical expectation of (d);
Figure BDA0002760120970000053
indicating a KL divergence;
obtaining the variation lower bound of the reconstruction objective function according to the formula (3)
Figure BDA0002760120970000054
As shown in the following formula (4):
Figure BDA0002760120970000055
fourthly, selecting Gaussian distribution as variation distribution q (z)1|z2) As shown in the following formula (5):
Figure BDA0002760120970000056
in the formula (5), μ (z)2) Mean, median latent variable z representing a Gaussian distribution2Calculated by 1 × 1 convolution layer;
σ(z2) The variance representing the gaussian distribution is calculated using the following formula (6):
Figure BDA0002760120970000057
in the above formula (6), C, H, W represents z1And z2The number, height and width of the channels; ε is a positive constant;
Figure BDA0002760120970000058
representing a real number domain space of dimension C;
the fifth step, substituting the above formula (6) into the above formula (5) and taking the logarithm, then the variation distribution q (z)1|z2) Represented by the following formula (7):
Figure BDA0002760120970000059
in the formula (7), c is a constant;
the above equation (7) expresses the complex mutual information item which cannot be solved into the mutual information variation which is easy to solve and is realized by programming, and through the above equation (7), the intermediate latent variable of the time-frequency domain U-shaped network model can be learned from the encoder to more supervision information to achieve the effect of priori knowledge compensation.
The invention provides a method for obtaining images of an image domain and a k space by using time domain and frequency domain algorithms as the input of a post-processing network, a time-frequency domain U-shaped network model designed by the invention can well combine the two input organizational structure information and remove artifacts, meanwhile, mutual information constraint provides rich prior knowledge for the network, and experiments prove that the method is superior to the prior mature and novel reconstruction algorithm in the limited visual angle photoacoustic reconstruction problem.
Drawings
FIG. 1 is a general schematic diagram of DuDoUnnet;
FIG. 2 is a schematic diagram of an ISB structure;
FIG. 3 is a general diagram of a mutual information constraint framework;
fig. 4 shows the results of the simulation (three parts, marked and enlarged below the picture, where the difference in reconstruction is evident).
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
Aiming at the problem of photoacoustic reconstruction at a limited visual angle, the invention provides a photoacoustic image reconstruction method at the limited visual angle based on mutual information, which comprises the following steps:
DAS image x obtained by using time domain reconstruction algorithm for ultrasonic signals received by ultrasonic probeiWhile k-space image x obtained using a frequency domain reconstruction algorithmk. The two reconstruction algorithms can reconstruct the same biological tissue information, but introduce different artifact information, which can teach the post-processing network at the input level which are tissues and which are artifacts.
In order to further distinguish the two kinds of information, the invention designs a variant model of Unet, namely a time-frequency Domain U-shaped network (Dual Domain Unet, hereinafter, referred to as DuDoUnet model), and the whole structure of the Unet is shown in FIG. 1.
DAS image x obtained by DuDoUnet model using time domain reconstruction algorithmiAnd k-space image x obtained by frequency domain reconstruction algorithmkAs an input. In FIG. 1, FFT and IThe FFT represents the fast fourier transform and the inverse fast fourier transform, respectively, and Concat represents that the inputs are connected in the channel dimension. This allows the two inputs of image domain and k-space to share information with each other and to distinguish artifacts. The whole dudonet model is regarded as an encoder-decoder structure, wherein the specific structure and the operation flow of an encoder are as follows:
DAS image xi Size 128X 1, k-space image xk Size 128X 2, where two channels are respectively k-space image xkReal and imaginary parts of (c). DAS image xiAfter fast fourier transform, the real part of the complex number is taken, and the size of the complex number is 128 × 128 × 1. In sum k-space image xkAfter the channels are connected, their size becomes 128 × 128 × 3. Likewise, k-space image xkTaking the real part of the complex number after fast Fourier transform, the size of the real part is 128 multiplied by 1, and summing the DAS image xiAfter the channels are connected, the size thereof becomes 128 × 128 × 2.
The time-frequency domain input then enters a first Information Sharing Block (ISB). The specific structure of ISB is shown in fig. 2.
The ISB has two inputs, a time domain input and a frequency domain input. The time domain input in ISB accomplishes two different tasks:
task 1: and (4) directly outputting. This part is used to extract shallow features in the time domain.
Task 2: the method is divided into three parts for time domain feature extraction and time domain to frequency domain feature conversion, and further comprises the following steps:
task 2.1: two 3 x 3 convolution kernel calculations and two residual convolution layer calculations are performed. Each convolution operation is additionally followed by batch normalization and ReLU activation operations. This part is used to extract high-level features in the time domain.
Task 2.2: and (4) directly outputting. This part is used to preserve the original characteristics of the time domain input layer.
Task 2.3: after fast fourier transform, two 3 x 3 convolution kernel calculations and two residual convolution layer calculations are performed. This part is used to extract the high-level features after the time domain is transformed into the frequency domain.
It is noted that although task 1 and task 2.2 are identical, the corresponding output operations are different and the degree of importance is different.
ISB only requires the frequency domain input to complete one task, namely task 3:
task 3: the method is divided into three parts for frequency domain feature extraction and frequency domain to time domain feature conversion, and further comprises the following steps:
task 3.1: two 3 x 3 convolution kernel calculations and two residual convolution layer calculations are performed. This part is used to extract the high-level features of the frequency domain.
Task 3.2: and (4) directly outputting. This part is used to preserve the original characteristics of the frequency domain input layer.
Task 3.3: after the inverse fast Fourier transform, the two convolution kernel calculations of 3 x 3 and the two residual convolution layer calculations are performed. This part is used to extract the high-level features after the frequency domain is transformed into the time domain.
The ISB has three outputs, namely a connection output, a time domain output, and a frequency domain output:
and (3) connection and output: the output in task 1 is subjected to two convolution kernel calculations of 3 × 3 and then connected to the corresponding decoder portion to retain the shallow information in the time domain.
And (3) time domain output: and connecting the outputs of the task 2.1, the task 2.2 and the task 3.3 on a channel, and outputting after calculating by two convolution kernels of 3 multiplied by 3, wherein the part ensures the interactive fusion of time domain information.
And (3) frequency domain output: and connecting the outputs of the task 3.1, the task 3.2 and the task 2.3 on a channel, and outputting after calculating by two convolution kernels of 3 multiplied by 3, wherein the part ensures the interactive fusion of frequency domain information.
The ISB is a general-purpose module in the DuDoUnet model, and can adapt to different input channel numbers and output the characteristics of the required channel number, wherein unique time-frequency domain transformation ensures that different tasks do not interfere with each other due to time-frequency domain characteristics while ensuring the sharing of photoacoustic image information of an image domain and a k space.
After the first ISB, the first ISB is connected to output one, and the size of the first time domain output and the first frequency domain output are both 128 x 12864, the time domain output I and the frequency domain output II pass through a maximum pooling layer of 2 × 2, and the sizes of the time domain output I and the frequency domain output II are both changed into 64 × 64 × 064. And then the two outputs are used as the time domain input and the frequency domain input of a second ISB, after the second ISB is calculated, a second output is connected, and the sizes of the second time domain output and the second frequency domain output are both 64 multiplied by 164 multiplied by 2128. The time domain output two and the frequency domain output two pass through a maximum pooling layer of 2 × 32, and the sizes of the time domain output two and the frequency domain output two are both changed into 32 × 432 × 5128. These two outputs are then used as the time domain input and frequency domain input of the third ISB, and after the third ISB is calculated, output three is connected, and the magnitudes of both the time domain output three and the frequency domain output three become 32 × 632 × 7256. The time domain output three and the frequency domain output three pass through a maximum pooling layer of 2 × 82, and the sizes of the time domain output three and the frequency domain output three are both changed into 16 × 916 × 256. And then the two outputs are used as the time domain input and the frequency domain input of a fourth ISB, after the fourth ISB is calculated, the fourth output is connected, and the sizes of the time domain output four and the frequency domain output four are both 16 multiplied by 016 multiplied by 512. The time domain output four and the frequency domain output four pass through a maximum pooling layer of 2 × 2, and the sizes of the time domain output four and the frequency domain output four are changed into 8 × 8 × 512. And the frequency domain output IV is subjected to inverse Fourier transform and then connected with the time domain input output IV on a channel, and the size of the frequency domain output IV is changed into 8 multiplied by 1024. After the output is calculated by two convolution kernels of 3 multiplied by 3, the intermediate potential characterization z is finally obtained2I.e., the final output of the encoder, has a size of 8 × 8 × 512.
The specific structure and operation flow of the decoder are as follows:
intermediate potential characterization z2The size of the upsampled convolutional layer of 2 × 2 becomes 16 × 16 × 0512, and the output is obtained after the connection output four in the fourth ISB is spliced on a channel, and the size of the output is 16 × 116 × 21024. The output continues through two convolution kernel calculations of 3 x 33 and becomes 16 x 416 x 5512 in size. The output is then subjected to a 2 x 62 upsampling convolutional layer operation, and the size becomes 32 x 732 x 8256. And the third output of the third ISB is spliced on the channel to obtain the output with the size of 32 × 932 × 512. The output continues through two convolution kernel calculations of 3 × 03, and the size becomes 32 × 32 × 256. The output then goes through a 2 x 2 upsampled convolutional layer operation, and the size becomes 64 x 128. And the second connection output in the second ISB is spliced on the channel to obtain an output with the size of 64 × 64 × 256. The output continues through two 3The convolution kernel calculation for × 3 becomes 64 × 64 × 0128 in size. The output is then subjected to a 2 × 12 upsampling convolutional layer operation, and the size becomes 128 × 2128 × 364. And the connection output in the first ISB-after splicing on the channel-results in an output of size 128 x 128. The output continues through two 3 x 3 convolution kernel calculations, becoming 128 x 64 in size. Finally, the output is subjected to a layer of 1 × 1 convolution kernel calculation to obtain a final output, and the size of the final output is equal to that of a real image (GT), namely 128 × 128 × 1.
We use the real photoacoustic image y as a surveillance tag and fθ(. cndot.) denotes the DuDoUnet model, and θ denotes all parameters trainable in the DuDoUnet model. The reconstruction objective function of the dudonet as a whole can be expressed as:
Figure BDA0002760120970000091
in the experiments, we chose the root Mean Square Error (MSE) as a loss function for the reconstruction.
Considering that the limited visual angle photoacoustic image input can only learn a priori knowledge through a supervision tag, but as the network deepens, the gradient disappearance problem easily occurs, and meanwhile if a decoder of the network is powerful enough, the encoder cannot learn useful information, and at the moment, the reconstructed model becomes a generated model. In this regard, the present invention proposes to compensate the prior knowledge using the latent variables in the middle of the mutual information constraints at the stage of model training. The overall frame is shown in fig. 3.
In order to make the encoder of DuDoUnnet model learn rich a priori, an auxiliary network (selected from automatic encoder in experiment) is pre-trained to provide intermediate potential characterization z1I.e. the output of the encoder in the automatic encoder, using mutual information constraints z1And intermediate potential characterization z extracted in DuDoUnet model2(output of encoder in dudonet model). Mutual information constraint ensures that the encoder of the DuDoUnnet model can learn more prior knowledge not only from the supervision tag but also through intermediate potential characterization, thereby achieving prior knowledgeThe effect of knowledge compensation.
After the mutual information constraint is added, in the model training phase, the overall objective function can be expressed as:
Figure BDA0002760120970000092
in formula (2), I (·,) represents mutual information; λ represents a penalty coefficient of mutual information constraint;
Figure BDA0002760120970000101
all parameters of the model can be obtained through the maximization formula (2). But it is very difficult to solve equation (2) directly. Therefore we only optimize the lower variation bound of equation (2). We use p (z)1|z2) Represents the true conditional distribution, but p (z)1|z2) Often unknown, so we use the variational distribution q (z)1|z2) To approximate the substitution of p (z)1|z2). The mutual information item can thus be expressed as:
Figure BDA0002760120970000102
in the formula (3), H (z)1) Representing intermediate potential tokens z1The entropy of the information of (1); h (z)1|z2) Representing a given intermediate potential representation z2Potential characterization of the posterior middle z1The conditional entropy of (a);
Figure BDA0002760120970000103
is expressed with respect to a parameter z1,z2A mathematical expectation of (d);
Figure BDA0002760120970000104
the representation represents the KL divergence. Because of z1Is an intermediate potential characterization provided by the pre-training network, so z is the training process1Remaining unchanged, and the KL divergence being non-negative, we can remove these two terms to obtain the lower bound of the variation of the objective function as shown in equation (4):
Figure BDA0002760120970000105
we select the Gaussian distribution as the variational distribution q (z)1|z2):
Figure BDA0002760120970000106
Wherein, mu (z)2) And σ (z)2) Representing the mean and variance of the gaussian distribution. Intermediate latent variable z2By 1x1 convolution layer calculation, we can get the mean value μ (z) of the Gaussian distribution2). But the variance σ (z) of the gaussian distribution is calculated using the same method2) Will cause instability in the network training, so we use equation (6) to calculate the variance:
Figure BDA0002760120970000107
in the above formula (6), C, H, W represents z1And z2The number, height and width of the channels; ε is a positive constant;
Figure BDA0002760120970000108
representing a real number domain space of dimension C.
After substituting equation (6) into equation (5) and taking the logarithm, the variation mutual information item can be expressed as:
Figure BDA0002760120970000111
in the formula (7), c is a constant.
Unlike MSE, equation (7) allows the network to impose different penalty constraints on different layers when the variance σ (z)2) Equation (7) is equivalent to MSE at constant 1.
Equation (7) expresses the complex mutual information items that cannot be solved as easily solving the mutual information variation that the programming implements. By formula (7), the intermediate latent variables of the DuDoUnet model can be learned from the encoder to more supervisory information to achieve the effect of a priori knowledge compensation.
To demonstrate the effectiveness of the present invention, we performed experimental validation using an open data set. We selected a total of 2000 samples, 1500 of which were used for training and 500 were left for testing. The method uses a K-wave photoacoustic simulation tool box to generate photoacoustic signals, and x is obtained through a time domain reconstruction algorithm and a frequency domain reconstruction algorithmiAnd xk. Then we obtain the reconstructed photoacoustic image using the mutual information constrained DuDoUnet model. In this experiment, the model parameters λ and ε were both set to 1.
In order to further compare the effectiveness of the method and the model, common reconstruction models such as Unet, Ynet, DEUnet and the like are selected for comparison with the model. Where Unet is divided into Unet #1 and Unet # 2. Where Unet #1 only inputs xi Unet #2 input xiAnd xkBut only one-way encoder is used. The Ynet and DEUnet models have a dual encoder structure, inputting xiAnd xkEach using a single encoder to extract information.
We compared the reconstruction results using Structural Similarity (SSIM) and peak signal-to-noise ratio (PSNR), the results are as follows:
Figure BDA0002760120970000112
under the condition of no mutual information constraint, the reconstruction result of the DuDoUnnet model exceeds that of other comparison models, and after the mutual information constraint is increased, the model obtains a better reconstruction result.
We further visually compare the reconstruction results with the reconstructed photoacoustic images, as shown in fig. 4. As can be seen from fig. 4, DuDoUnet and DuDoUnet + MI are more able to restore structural information of blood vessels under the same input conditions.

Claims (10)

1. A limited visual angle photoacoustic image reconstruction method based on mutual information is characterized by comprising the following steps:
DAS image x obtained by using time domain reconstruction algorithm for ultrasonic signals received by ultrasonic probeiWhile k-space image x obtained using a frequency domain reconstruction algorithmk
DAS image xiAnd k-space image xkAnd reconstructing the input time-domain U-shaped network model to obtain the photoacoustic image, wherein the time-domain U-shaped network model adopts an encoder-decoder structure:
DAS image xiAfter being input into the encoder, the real part of the complex number is taken after fast Fourier transform, and then the real part of the complex number is compared with a k space image x input into the encoderkInputting a first information sharing module after the channels are connected; k-space image xkAfter being input into an encoder, the real part of a complex number is taken after fast inverse Fourier transform, and then the real part of the complex number is combined with a DAS image x of the input encoderiInputting a first information sharing module after the channels are connected;
the time domain output I and the frequency domain output obtained by the information sharing module I are used as the time domain input and the frequency domain input of the information sharing module II after passing through the maximum pooling layer I; the time domain output II and the frequency domain output II obtained by the information sharing module II pass through the maximum pooling layer II and then are used as the time domain input and the frequency domain input of the information sharing module III; the time domain output III and the frequency domain output III obtained by the information sharing module III pass through the maximum pooling layer III and then are used as the time domain input and the frequency domain input of the information sharing module IV; after the time domain output IV and the frequency domain output IV of the information sharing module IV pass through the maximum pooling layer IV, the frequency domain output IV is connected with the time domain output IV on a channel after inverse Fourier transform, and then the intermediate potential characterization z is obtained through convolution kernel calculation2Intermediate potential characterization z2Is the final output of the encoder;
the first information sharing module, the second information sharing module, the third information sharing module and the fourth information sharing module perform the same processing on time domain input and frequency domain input, and the first information sharing module, the second information sharing module, the third information sharing module or the fourth information sharing module is defined as the information sharing module, and the following steps are performed:
the information sharing module extracts shallow features of time domain input and outputs the shallow features to a decoder;
the information sharing module interactively fuses the extracted high-level features of the time domain input, the extracted high-level features of the frequency domain input after being converted into the time domain and the original features of the time domain input to form a first time domain output, a second time domain output, a third time domain output or a fourth time domain output;
the information sharing module interactively fuses the extracted high-level features of the frequency domain input, the extracted high-level features of the time domain input after being converted into the frequency domain and the original features of the frequency domain input to form a first frequency domain output, a second frequency domain output, a third frequency domain output or a fourth frequency domain output;
the decoder characterizes the intermediate potential of the encoder output z2And recovering as a photoacoustic image.
2. The mutual information-based restricted viewing angle photoacoustic image reconstruction method of claim 1, wherein the maximum pooling layer one, the maximum pooling layer two, the maximum pooling layer three, and the maximum pooling layer four are all 2 x 2 maximum pooling layers.
3. The mutual information-based restricted viewing angle photoacoustic image reconstruction method of claim 1, wherein the information sharing module directly outputs the time domain input to extract the shallow features of the time domain input, and the output time domain input is calculated by two convolution kernels of 3 × 3 and then connected to the corresponding decoder portion to retain the shallow information of the time domain, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations.
4. The method for reconstructing the photoacoustic image with the limited viewing angle based on the mutual information as claimed in claim 1, wherein the high-level features of the time-domain input are obtained by performing two convolution kernel calculations of 3 × 3 and two residual convolution layer calculations on the time-domain input in the information sharing module, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations;
the time domain input is subjected to fast Fourier transform in the information sharing module, and then is subjected to two 3 x 3 convolution kernel calculations and two residual convolution layer calculations to extract high-level characteristics of the time domain input transformed to a frequency domain, wherein batch normalization and ReLU activation operations are additionally added after each convolution operation;
the time domain input is not subjected to any computation within the information sharing module to preserve the original characteristics of the time domain input.
5. The mutual information-based restricted viewing angle photoacoustic image reconstruction method of claim 1, wherein the high-level features of the time domain input, the high-level features of the frequency domain input after being transformed into the time domain, and the original features of the time domain input are subjected to two 3 x 3 convolution kernel calculations after channel connection to obtain the first time domain output, the second time domain output, the third time domain output, or the fourth time domain output, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations.
6. The method as claimed in claim 1, wherein the frequency domain input is subjected to two convolution kernel calculations of 3 × 3 and two residual convolution layer calculations in the information sharing module to obtain the high-level features of the frequency domain input, and wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations;
the frequency domain input is subjected to inverse fast Fourier transform in the information sharing module, and then is subjected to two 3 x 3 convolution kernel calculations and two residual convolution layer calculations to extract high-level features of the frequency domain input after being transformed into a time domain, wherein batch normalization and ReLU activation operations are additionally added after each convolution operation;
the frequency domain input is not subjected to any computation within the information sharing module to preserve the original characteristics of the frequency domain input.
7. The method as claimed in claim 1, wherein the high-level features of the frequency domain input, the high-level features of the time domain input after being transformed into the frequency domain, and the original features of the frequency domain input are subjected to two 3 × 3 convolution kernel calculations after being connected by channels to obtain the first frequency domain output, the second frequency domain output, the third frequency domain output, or the fourth frequency domain output, wherein each convolution operation is additionally followed by batch normalization and ReLU activation operations.
8. The mutual information-based restricted viewing angle photoacoustic image reconstruction method of claim 1, wherein the intermediate latent representation z2Processing in the decoder by:
step 1, intermediate potential characterization z2The method comprises the steps of firstly performing upsampling convolution on a channel, then splicing the upsampling convolution with shallow features output by the information sharing module, then continuously performing two 3 x 3 convolution kernel calculations, and then performing upsampling convolution operation, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation;
step 2, splicing the shallow features of the output obtained in the step 1 and the output of the information sharing module III on a channel, then continuously calculating by two convolution kernels of 3 multiplied by 3, and then performing upsampling convolutional layer operation, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation;
step 3, splicing the shallow features of the output obtained in the step 2 and the output of the information sharing module II on a channel, then continuously calculating by two convolution kernels of 3 multiplied by 3, and then performing upsampling convolutional layer operation, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation;
and 4, splicing the shallow features output by the first information sharing module and the output obtained in the step 3 on a channel, then continuously calculating two convolution kernels with the size of 3 multiplied by 3, and then calculating the convolution kernels to obtain the final photoacoustic image, wherein batch normalization and ReLU activation operation are additionally added after each convolution operation.
9. The mutual-information-based restricted viewing angle photoacoustic image reconstruction method of claim 1, wherein the latent variables in the middle of the mutual information constraint are used to compensate the prior knowledge in the training phase of the time-frequency domain U-shaped network model.
10. The mutual information-based restricted viewing angle photoacoustic image reconstruction method of claim 1, wherein the compensating the a priori knowledge using the latent variables in the middle of the mutual information constraint specifically comprises the steps of:
first step, the reconstruction objective function of the time-frequency domain U-shaped network model
Figure FDA0002760120960000031
As shown in the following formula (1):
Figure FDA0002760120960000041
in the formula (1), theta represents all trainable parameters in the time-frequency domain U-shaped network model; y represents a true photoacoustic image;
Figure FDA0002760120960000042
represents a mathematical expectation with respect to the parameter θ; f. ofθ() represents the time-frequency domain U-network model;
second, pre-training an auxiliary network to provide intermediate potential characterization z1As mutual information constraints;
thirdly, after the mutual information constraint is added, in the training stage of the time-frequency domain U-shaped network model, the reconstruction objective function of the time-frequency domain U-shaped network model
Figure FDA0002760120960000043
Represented by the following formula (2):
Figure FDA0002760120960000044
in formula (2), I (·,) represents mutual information; λ represents a penalty coefficient of mutual information constraint;
Figure FDA0002760120960000045
thirdly, optimizing the objective function
Figure FDA0002760120960000046
Lower bound of variation of (c):
with p (z)1|z2) Representing the true conditional distribution, using the variational distribution q (z)1|z2) To approximate the substitution of p (z)1|z2) Then the mutual information item I (z)1,z2) Represented by the following formula (3):
Figure FDA0002760120960000047
in the formula (3), H (z)1) Representing intermediate potential tokens z1The entropy of the information of (1); h (z)1|z2) Representing a given intermediate potential representation z2Potential characterization of the posterior middle z1The conditional entropy of (a);
Figure FDA0002760120960000048
is expressed with respect to a parameter z1,z2A mathematical expectation of (d);
Figure FDA0002760120960000049
indicating a KL divergence;
obtaining the variation lower bound of the reconstruction objective function according to the formula (3)
Figure FDA00027601209600000410
As shown in the following formula (4):
Figure FDA00027601209600000411
fourthly, selecting Gaussian distribution as variation distribution q (z)1|z2) As shown in the following formula (5):
Figure FDA00027601209600000412
in the formula (5), μ (z)2) Mean, median latent variable z representing a Gaussian distribution2Calculated by 1 × 1 convolution layer;
σ(z2) The variance representing the gaussian distribution is calculated using the following formula (6):
Figure FDA0002760120960000051
in the above formula (6), C, H, W represents z1And z2The number, height and width of the channels; ε is a positive constant;
Figure FDA0002760120960000052
representing a real number domain space of dimension C;
the fifth step, substituting the above formula (6) into the above formula (5) and taking the logarithm, then the variation distribution q (z)1|z2) Represented by the following formula (7):
Figure FDA0002760120960000053
in the formula (7), c is a constant;
the above equation (7) expresses the complex mutual information item which cannot be solved into the mutual information variation which is easy to solve and is realized by programming, and through the above equation (7), the intermediate latent variable of the time-frequency domain U-shaped network model can be learned from the encoder to more supervision information to achieve the effect of priori knowledge compensation.
CN202011215182.6A 2020-11-04 2020-11-04 Mutual information-based limited view photoacoustic image reconstruction method Active CN112308941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011215182.6A CN112308941B (en) 2020-11-04 2020-11-04 Mutual information-based limited view photoacoustic image reconstruction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011215182.6A CN112308941B (en) 2020-11-04 2020-11-04 Mutual information-based limited view photoacoustic image reconstruction method

Publications (2)

Publication Number Publication Date
CN112308941A true CN112308941A (en) 2021-02-02
CN112308941B CN112308941B (en) 2023-06-20

Family

ID=74325545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011215182.6A Active CN112308941B (en) 2020-11-04 2020-11-04 Mutual information-based limited view photoacoustic image reconstruction method

Country Status (1)

Country Link
CN (1) CN112308941B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140198606A1 (en) * 2013-01-15 2014-07-17 Helmsholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) System and method for quality-enhanced high-rate optoacoustic imaging of an object
RU2014115603A (en) * 2014-04-18 2015-10-27 Федеральное государственное бюджетное учреждение науки Научно-технологический центр уникального приборостроения РАН ACOUSTOPTIC DEVICE FOR RECEIVING SPECTRAL STEREO IMAGES WITH SPECTRAL TRANSFORMATION
CN111507884A (en) * 2020-04-19 2020-08-07 衡阳师范学院 Self-adaptive image steganalysis method and system based on deep convolutional neural network
CN111640069A (en) * 2020-04-17 2020-09-08 上海交通大学 Compressive imaging method, system and device based on light sensing network and phase compensation
US20200342304A1 (en) * 2019-04-25 2020-10-29 International Business Machines Corporation Feature importance identification in deep learning models
CN111915691A (en) * 2019-05-07 2020-11-10 上海科技大学 Image processing system, method, terminal and medium based on neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140198606A1 (en) * 2013-01-15 2014-07-17 Helmsholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) System and method for quality-enhanced high-rate optoacoustic imaging of an object
RU2014115603A (en) * 2014-04-18 2015-10-27 Федеральное государственное бюджетное учреждение науки Научно-технологический центр уникального приборостроения РАН ACOUSTOPTIC DEVICE FOR RECEIVING SPECTRAL STEREO IMAGES WITH SPECTRAL TRANSFORMATION
US20200342304A1 (en) * 2019-04-25 2020-10-29 International Business Machines Corporation Feature importance identification in deep learning models
CN111915691A (en) * 2019-05-07 2020-11-10 上海科技大学 Image processing system, method, terminal and medium based on neural network
CN111640069A (en) * 2020-04-17 2020-09-08 上海交通大学 Compressive imaging method, system and device based on light sensing network and phase compensation
CN111507884A (en) * 2020-04-19 2020-08-07 衡阳师范学院 Self-adaptive image steganalysis method and system based on deep convolutional neural network

Also Published As

Publication number Publication date
CN112308941B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
Solomon et al. Deep unfolded robust PCA with application to clutter suppression in ultrasound
AU2013400936B2 (en) Image analysis techniques for diagnosing diseases
Zhou et al. High spatial–temporal resolution reconstruction of plane-wave ultrasound images with a multichannel multiscale convolutional neural network
CN110097512A (en) Construction method and the application of the three-dimensional MRI image denoising model of confrontation network are generated based on Wasserstein
US11776679B2 (en) Methods for risk map prediction in AI-based MRI reconstruction
CN110610528A (en) Model-based dual-constraint photoacoustic tomography image reconstruction method
Yancheng et al. RED-MAM: A residual encoder-decoder network based on multi-attention fusion for ultrasound image denoising
Qiao et al. A pseudo-siamese feature fusion generative adversarial network for synthesizing high-quality fetal four-chamber views
CN115830016A (en) Medical image registration model training method and equipment
Tehrani et al. Robust scatterer number density segmentation of ultrasound images
Panda et al. A 3D wide residual network with perceptual loss for brain MRI image denoising
CN112819740B (en) Medical image fusion method based on multi-component low-rank dictionary learning
Li et al. Deep attention super-resolution of brain magnetic resonance images acquired under clinical protocols
Lim et al. Motion artifact correction in fetal MRI based on a Generative Adversarial network method
Aja-Fernández et al. Validation of deep learning techniques for quality augmentation in diffusion MRI for clinical studies
CN112308941B (en) Mutual information-based limited view photoacoustic image reconstruction method
Shahid et al. Feasibility of a generative adversarial network for artifact removal in experimental photoacoustic imaging
Ma et al. Edge-guided cnn for denoising images from portable ultrasound devices
Consagra et al. Neural Orientation Distribution Fields for Estimation and Uncertainty Quantification in Diffusion MRI
Seoni et al. Ultrasound Image Beamforming Optimization Using a Generative Adversarial Network
Liu et al. Progressive residual learning with memory upgrade for ultrasound image blind super-resolution
Ekmekci et al. Quantifying Generative Model Uncertainty in Posterior Sampling Methods for Computational Imaging
CN116597041B (en) Nuclear magnetic image definition optimization method and system for cerebrovascular diseases and electronic equipment
CN112085687B (en) Method for converting T1 to STIR image based on detail enhancement
Chen et al. Unsupervised image-to-image translation in multi-parametric MRI of bladder cancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant