Disclosure of Invention
In order to overcome the defects of the prior art, the technical problem to be solved by the invention is to provide a method for reconstructing a three-dimensional ultrasonic image, which is good for processing the missing area with an irregular shape and is suitable for reconstructing the missing area at any position.
The technical scheme of the invention is as follows: the method for reconstructing the three-dimensional ultrasonic image comprises the following steps:
(1) interpolating the collected two-dimensional B-ultrasonic slice sequence with the spatial positioning information into a three-dimensional image with a cavity according to the spatial position of the two-dimensional B-ultrasonic slice sequence;
(2) replacing the conventional convolutional layer with three-dimensional partial convolutional layers, and each partial convolutional layer is followed by a three-dimensional mask updating step;
(3) constructing a spectrum normalization least square generation type countermeasure network, wherein the generation type countermeasure network comprises a generator and a discriminator;
(4) combining the content loss and the antagonistic loss to construct a new loss function for the ultrasonic reconstruction, wherein the content loss comprises: context loss, total variation loss, and feature mapping loss;
(5) and simultaneously inputting the three-dimensional ultrasonic image with the cavity and the three-dimensional mask image into a generator of the trained impedance network, and then generating the three-dimensional ultrasonic image after repairing the cavity.
The invention interpolates a two-dimensional B-ultrasonic slice sequence into a three-dimensional image with a hole according to the space position of the two-dimensional B-ultrasonic slice sequence, replaces the traditional convolution layer with three-dimensional partial convolution layers, and each partial convolution layer is followed by a three-dimensional mask updating step to construct a new spectrum normalized least square generation countermeasure network, combines the content loss and the countermeasure loss, simultaneously inputs a three-dimensional ultrasonic image with the hole and a three-dimensional mask image to a generator of the trained countermeasure network, and then generates a three-dimensional ultrasonic image after the hole is repaired, thereby having good effect on processing the missing area with the irregular shape and being suitable for the reconstruction of the missing area at any position.
There is also provided an apparatus for reconstructing a three-dimensional ultrasound image, the apparatus comprising:
the interpolation module is configured to interpolate the acquired two-dimensional B-ultrasonic slice sequence with the spatial positioning information into a three-dimensional image with a cavity according to the spatial position of the two-dimensional B-ultrasonic slice sequence;
a three-dimensional partial convolution and three-dimensional mask update module configured to replace a conventional convolutional layer with a three-dimensional partial convolutional layer, and each partial convolutional layer is followed by a three-dimensional mask update step;
a build network module configured to build a spectral normalized least squares generating confrontation network, the generating confrontation network comprising a generator and a discriminator;
a loss combining module configured to combine the content loss and the antagonistic loss to construct a new loss function for the ultrasound reconstruction, wherein the content loss comprises: context loss, total variation loss, and feature mapping loss;
and the restoration module is configured to simultaneously input the three-dimensional ultrasonic image with the cavity and the three-dimensional mask image to the trained generator of the impedance network and then generate the three-dimensional ultrasonic image after the cavity is restored.
Detailed Description
As shown in fig. 4, the method for reconstructing a three-dimensional ultrasound image includes the following steps:
(1) interpolating the collected two-dimensional B-ultrasonic slice sequence with the spatial positioning information into a three-dimensional image with a cavity according to the spatial position of the two-dimensional B-ultrasonic slice sequence;
(2) replacing the conventional convolutional layer with three-dimensional partial convolutional layers, and each partial convolutional layer is followed by a three-dimensional mask updating step;
(3) constructing a new spectrum normalized least squares generative confrontation network, wherein the generative confrontation network comprises a generator and a discriminator;
(4) combining the content loss and the antagonistic loss to construct a new loss function for the ultrasonic reconstruction, wherein the content loss comprises: context loss, total variation loss, and feature mapping loss;
(5) and simultaneously inputting the three-dimensional ultrasonic image with the cavity and the three-dimensional mask image into a generator of the trained impedance network, and then generating the three-dimensional ultrasonic image after repairing the cavity.
The invention interpolates a two-dimensional B-ultrasonic slice sequence into a three-dimensional image with a hole according to the space position of the two-dimensional B-ultrasonic slice sequence, replaces the traditional convolution layer with three-dimensional partial convolution layers, and each partial convolution layer is followed by a three-dimensional mask updating step to construct a new spectrum normalized least square generation type countermeasure network, combines the content loss and the countermeasure loss, simultaneously inputs a three-dimensional ultrasonic image with the hole and a three-dimensional mask image to a generator of the trained countermeasure network, and then generates a three-dimensional ultrasonic image after the hole is repaired, thereby having good effect on processing the missing area with the irregular shape and being suitable for the reconstruction of the missing area at any position.
Preferably, in the step (1), an individual ultrasonic slice sequence with spatial positioning information is acquired, and 3D volume data is interpolated according to the spatial positioning information between the sequences; the interpolation process is as follows: firstly, establishing an empty three-dimensional body with a space coordinate system, wherein the space coordinate system comprises an original point, a size and a body grid interval; and mapping the pixels in the two-dimensional ultrasonic image to the corresponding voxels near the three-dimensional body according to the spatial position information.
Preferably, in the step (2), the partial convolutional layer is defined by the formula (1):
where W is the weight of the convolution filtering, b is the corresponding deviation, X represents the eigenvalue of the current convolution window, M is the corresponding binary mask, 1 represents that the voxel at position (X, y, z) is valid, 0 represents that the voxel at (X, y, z) is invalid; the convolution output values depend only on the unmasked input, and a scaling factor of 1/sum (M) is applied to adjust the variation of the unmasked input, and given a three-dimensional binary mask, the three-dimensional convolution result depends only on the content of the known area of each layer;
the mask update is performed after each partial convolution operation, and is expressed as equation (2):
if the convolution is able to adjust its output over at least one valid input value, the mask for that location is deleted; if the input includes any valid voxels, with full application of the partial convolutional layer, even the larger mask area will shrink, and any mask will eventually be all 1's.
Preferably, in the step (2), all normal convolutions are replaced by three-dimensional partial convolutions, the image is transmitted together with the mask through the network, and a residual block structure and jump connection are introduced into a decoder of a 3D U-Net network architecture; all convolutional layers were convolved with a 3 x 3 convolution kernel, using a three-dimensional learnelu activation layer with alpha 0.2 at the decoder stage; the three-dimensional Relu activation layer is used in all coding layers and all layers of the discriminator; except the first layer and the last layer, a three-dimensional standardization layer is used between each three-dimensional partial convolution layer and the three-dimensional LEAKyrelu activation layer and between each three-dimensional partial convolution layer and the three-dimensional Relu activation layer; in the decoding stage, the three-dimensional normalization layer is followed by a droupout layer with a rate of 0.5 to prevent overfitting of the training data. Defining a mask with the size of DxHxWxC, wherein the mask and the image have the same size, and then realizing the process of updating the mask by utilizing a fixed layer, wherein the size of a convolution kernel of the layer is the same as that of a partial convolution operation, but the weight is set to be 1, and the offset is set to be 0; all three-dimensional dropout layers, three-dimensional leakyrelu activation layers and three-dimensional Relu activation layers only act on partial convolution operations and do not act on mask update layers. Different learning rates are set for the generator and the discriminator.
Preferably, the training of the generator and discriminator network is further stabilized in said step (3) using a three-dimensional spectral normalization method, which is a method aiming at controlling the Lipschitz constants of the generator and discriminator by normalizing the weight of each convolution layer in the network, the new weights after normalization
Is defined as:
where W is the weight of each layer, σ (W) is the largest singular value of the weight; the method applies global regularization to the discriminators and generators.
Preferably, in the step (4), the image I with the missing region is giveninInitial binary mask M, 0 representing a hole, generating image IoutAnd gold standard image Igt;
First a context loss L is definedcTo receive an input image IinThe remaining available information is captured, the context loss is based on the assumption that voxels far from the missing region are less important to the repair process, and the importance W is defined as formula (4):
where i denotes the index of the voxel, WiRepresenting the importance of the voxel at position i, n (i) representing the neighborhood set of voxel i in the local window, and the cardinality of n (i) at | n (i) |;
context loss LcIs defined as:
Lc=||W⊙(Iout-Igt)||1(5)
where ⊙ denotes the multiplication of elements by elements.
Total variation loss LtvIs equation (6), is a smoothing penalty on the generated P, where P is the 1 voxel dilated region of the missing region,
wherein, IcompIs to generate an output image IoutBut directly replacing the pixels of the non-missing region with the corresponding gold standard image IgtThe voxel (b);
loss of feature matching LFMDefined by formula (7):
where L is the last layer of the discriminator, N
iIs the total number of elements of the ith layer,
is the activation mapping of the arbiter ith layer;
against loss LGAN(Gsn,Dsn) The equation (8) is calculated by the least square GAN introduced in the training phase, and arg min is solved by training the generator and the discriminator simultaneouslyGmaxDLGAN(Gsn,Dsn) Obtaining:
wherein D issnDiscriminator indicating spectral normalization, GsnRepresenting a spectrally normalized ultrasound generator, inputs an ultrasound volume y with a missing region.
Preferably, in the step (5), the training process for generating the antagonistic network is as follows: obtaining a content loss function of the real ultrasonic image according to the output result of the generator and the corresponding real ultrasonic image; based on the least square loss function of a generator and a discriminator in the generated countermeasure network, obtaining the total loss function of the generator according to the output result of the discriminator and the content loss function of the real ultrasonic image, and obtaining the total loss function of the discriminator according to the output result of the discriminator; and respectively updating parameters in the network structures of the arbiter and the generator according to the total loss function of the arbiter and the total loss function of the generator until the generation of the confrontation network convergence.
It will be understood by those skilled in the art that all or part of the steps in the method of the above embodiments may be implemented by hardware instructions related to a program, the program may be stored in a computer-readable storage medium, and when executed, the program includes the steps of the method of the above embodiments, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like. Therefore, corresponding to the method of the present invention, the present invention also includes a three-dimensional ultrasound image reconstruction apparatus, which is generally represented in the form of functional blocks corresponding to the steps of the method. The device includes:
the interpolation module is configured to interpolate the acquired two-dimensional B-ultrasonic slice sequence with the spatial positioning information into a three-dimensional image with a cavity according to the spatial position of the two-dimensional B-ultrasonic slice sequence;
a three-dimensional partial convolution and three-dimensional mask update module configured to replace a conventional convolutional layer with a three-dimensional partial convolutional layer, and each partial convolutional layer is followed by a three-dimensional mask update step;
a build network module configured to build a new spectral normalized least squares generating confrontation network, the generating confrontation network comprising a generator and a discriminator;
a loss combining module configured to combine the content loss and the antagonistic loss to construct a new loss function for the ultrasound reconstruction, wherein the content loss comprises: context loss, total variation loss, and feature mapping loss;
and the restoration module is configured to simultaneously input the three-dimensional ultrasonic image with the cavity and the three-dimensional mask image to the trained generator of the impedance network and then generate the three-dimensional ultrasonic image after the cavity is restored.
The present invention is described in more detail below.
And acquiring ultrasonic slice sequences with spatial positioning information of the same patient, and interpolating into 3D volume data according to the spatial information between the sequences. The interpolation process first creates an empty three-dimensional volume with a spatial coordinate system (including origin, size and volume grid spacing). After the grid is constructed, the pixels in the two-dimensional ultrasonic image are mapped to the corresponding voxels near the three-dimensional body according to the spatial position information. The coordinate relationship between the ultrasound slice and the three-dimensional volume is shown in figure 1. Due to the subjectivity of handheld scans, the two-dimensional ultrasound images collected are typically highly sparse. Thus, after the above voxel mapping is completed, there will be some gaps in the three-dimensional ultrasound volume, as shown in the right part of fig. 1. The invention aims to fill the gap part by using a three-dimensional least square generation countermeasure network, and the three-dimensional least square generation countermeasure network not only can fill the gap by using the information of the known voxels, but also can synthesize non-repetitive textures and structures.
The definition of the partial convolutional layer is as follows:
where W is the weight of the convolution filtering, b is the corresponding deviation, and X represents the eigenvalue of the current convolution window. M is the corresponding binary mask, 1 indicates that the voxel at position (x, y, z) is valid, and 0 indicates that the voxel at (x, y, z) is invalid. From equation (1), it can be seen that the output value of the convolution depends only on the unmasked input. The scaling factor 1/sum (M) is applied to adjust the amount of change in the unmasked input. Given a three-dimensional binary mask, the three-dimensional convolution results of the present invention depend only on the contents of the known regions of each layer.
After each partial convolution operation we perform an update of the mask, which is expressed as:
if the convolution is able to adjust its output over at least one valid input value, the mask at that location is removed. This process may be implemented in any network architecture. If the input includes any valid voxels, with sufficient application of the partial convolution layer, even the larger mask area will shrink and any mask will eventually be updated to 1.
The proposed network architecture is a network architecture similar to 3D U-Net in which all normal convolutions are replaced by three-dimensional partial convolutions, so in this case the image is passed through the network together with the mask, and figure 2 provides the architecture of the entire network. In the decoder of the 3D U-Net network architecture, a residual block structure and a jump connection are introduced, as shown in fig. 3, similar to the structure of the SRGAN. Hopping connections mitigates the network architecture that models self-mapping, which may not be easy to represent using a convolution kernel. All convolutional layers were convolved with a convolution kernel of 3 x 3, using a three-dimensional learnelu activation layer with alpha 0.2 at the decoder stage. The three-dimensional Relu activation layer is used in all coding layers and all layers of the arbiter. Three-dimensional normalization layers were used between each three-dimensional partial convolution layer and the three-dimensional LeakyRelu/Relu layers, except for the first and last layers. In the decoding stage, the three-dimensional normalization layer is followed by a droupout layer (to counteract the internal covariate shift) at a rate of 0.5 to prevent overfitting of the training data.
A mask of size D × H × W × C is defined, and the same size as the image is obtained, and then the process of updating the mask is implemented using a fixed layer whose convolution kernel size is the same as that of the partial convolution operation, but with a weight set to 1 and an offset set to 0. It is noted that all three-dimensional dropout layers, the three-dimensional leakrelu/Relu layers, only act on a partial convolution operation and not on the mask update layer.
In training the network, different learning rates may be set for the generator and the arbiter in order to balance the training speeds of both the generator and the arbiter.
The training of the generator and the discriminator network is further stabilized by a three-dimensional spectral normalization method. Three-dimensional spectral normalization is a method that aims at controlling the Lipschitz constants (assuming statistical bounding) of the generator and the arbiter by normalizing the weights of each convolutional layer in the network. The new weights after normalization are defined as:
where W is the weight of each layer and σ (W) is the largest singular value of the weight. Unlike other weight normalizations, spectral normalization allows the parameter matrix to use as many features as possible while satisfying the local 1-lipschitz constraint, and can improve the quality of the generated image. The method applies global regularization to the discriminators and generators and can be easily combined with least squares generation of competing networks or other forms of GANs.
In order to improve the quality of the reconstructionContent loss is designed for the training of the generator, and consists of three parts, namely context loss, total variation loss and feature matching loss. Given an image I with missing regionsinInitial binary mask M (0 representing a hole), generating image IoutAnd gold standard image Igt. First a context loss L is definedcTo receive an input image IinThe remaining available information is captured. The context loss is based on the assumption that voxels far from the missing region are less important to the repair process. Defining the importance W:
where i denotes the index of the voxel, WiRepresenting the importance of the voxel at position i, n (i) representing the neighborhood set of voxel i in the local window, and the cardinality of n (i) at | n (i) |. Context loss LcIs defined as:
Lc=||W⊙(Iout-Igt)||1(5)
where ⊙ denotes the multiplication of elements by elements.
Next, the total variation loss L is definedtv. The total variation loss is a smooth penalty on the generated P, where P is the 1 voxel dilated region of the missing region.
Wherein, IcompIs to generate an output image IoutBut directly replacing the pixels of the non-missing region with the corresponding gold standard image IgtThe voxel (2).
The final loss is the feature matching loss LFMSimilar to the perceptual loss of perceptual differences between measured images. The loss-aware activation map is computed using a pre-trained VGG19 network. The feature matching penalty comparison is the activation map of the middle layer of the discriminator, as shown in FIG. 2. Measuring the difference between the output image and the real image and forcing the generator to generate classesSimilar to the output of a real output. With the context loss and the total variation loss, the intrinsic similarity between ultrasound volumes is not measured, but only their surface differences in euclidean distance. By comparing the internal structures of the ultrasound volume, they should be projected onto the manifold and their geodesic distances calculated, so exploiting the feature matching penalty helps to produce a more sharp detailed reconstruction result. Loss of feature matching LFMIs defined as:
where L is the last layer of the discriminator, N
iIs the total number of elements of the ith layer,
is the activation map for the i-th layer of the arbiter.
The resistance loss is calculated by least square GAN introduced in the training phase, and argmin is solved by simultaneously training the generator and the discriminatorGmaxDLGAN(Gsn,Dsn) Obtaining:
wherein D issnDiscriminator indicating spectral normalization, GsnRepresenting a spectrally normalized ultrasound generator, inputs an ultrasound volume y with a missing region.
Therefore, the training process for generating the countermeasure network is to obtain a content loss function of the real ultrasonic image according to the output result of the generator and the corresponding real ultrasonic image; based on the least square loss function of a generator and a discriminator in the generated countermeasure network, obtaining the total loss function of the generator according to the output result of the discriminator and the content loss function of the real ultrasonic image, and obtaining the total loss function of the discriminator according to the output result of the discriminator; and respectively updating parameters in the network structures of the arbiter and the generator according to the total loss function of the arbiter and the total loss function of the generator until the generation of the confrontation network convergence.
Generating an impedance network by the deep network learning method, simultaneously inputting a three-dimensional ultrasonic image with a cavity and a three-dimensional mask image into a generator of the trained impedance network, and then generating a three-dimensional ultrasonic image after repairing the cavity, wherein the synthesized ultrasonic image is a highly simulated ultrasonic image with a real ultrasonic image.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.