CN112734907B - Ultrasonic or CT medical image three-dimensional reconstruction method - Google Patents

Ultrasonic or CT medical image three-dimensional reconstruction method Download PDF

Info

Publication number
CN112734907B
CN112734907B CN202011623243.2A CN202011623243A CN112734907B CN 112734907 B CN112734907 B CN 112734907B CN 202011623243 A CN202011623243 A CN 202011623243A CN 112734907 B CN112734907 B CN 112734907B
Authority
CN
China
Prior art keywords
image
network
multiplied
tensor
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011623243.2A
Other languages
Chinese (zh)
Other versions
CN112734907A (en
Inventor
全红艳
钱笑笑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN202011623243.2A priority Critical patent/CN112734907B/en
Publication of CN112734907A publication Critical patent/CN112734907A/en
Application granted granted Critical
Publication of CN112734907B publication Critical patent/CN112734907B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a three-dimensional reconstruction method of an ultrasonic or CT medical image, which is characterized by unsupervised learning, wherein a three-dimensional reconstruction result can be obtained according to an input ultrasonic or CT image sequence, three convolutional neural networks A, B and C are designed, network parameters are obtained through training, and the three-dimensional structure of the ultrasonic or CT image is further obtained. The invention can effectively realize the three-dimensional reconstruction of the ultrasonic or CT image, fully play the role of auxiliary diagnosis in the auxiliary diagnosis of artificial intelligence, and improve the efficiency of auxiliary diagnosis by 3D visual reconstruction results.

Description

Ultrasonic or CT medical image three-dimensional reconstruction method
Technical Field
The invention belongs to the technical field of computers, relates to the intelligent auxiliary diagnosis problem of ultrasonic or CT images, and relates to an ultrasonic image three-dimensional reconstruction method for intelligent medical auxiliary diagnosis.
Background
In recent years, artificial intelligence technology is rapidly developed, image three-dimensional reconstruction is a key technology of medical auxiliary diagnosis, and the research significance is great. At present, some three-dimensional reconstruction technologies related to CT images, nuclear magnetic images and the like appear, and for the three-dimensional reconstruction of ultrasonic images, the research of the three-dimensional reconstruction technology of the ultrasonic images at present has certain difficulty due to certain difficulty in parameter recovery of a camera. How to establish an effective deep learning network model and effectively solve the three-dimensional reconstruction problem of the ultrasonic image is an actual problem to be solved urgently, and meanwhile, the reconstruction method of the medical image has a certain application value.
Disclosure of Invention
The invention provides a three-dimensional reconstruction method of an ultrasonic or CT image based on deep learning, which can obtain a more accurate three-dimensional reconstruction result and has higher practical value.
The specific technical scheme for realizing the purpose of the invention is as follows:
a three-dimensional reconstruction method of ultrasonic or CT medical images inputs an ultrasonic or CT image sequence, the image resolution is MxN, M is more than or equal to 100 and less than or equal to 1500, N is more than or equal to 100 and less than or equal to 1500, the three-dimensional reconstruction process specifically comprises the following steps:
step 1: building a data set
(a) Constructing a natural image dataset D
Selecting a natural image website, requiring image sequences and corresponding internal parameters of a camera, downloading a image sequences and the corresponding internal parameters of the sequences from the natural image website, wherein a is more than or equal to 1 and less than or equal to 20, for each image sequence, recording every 3 adjacent frames of images as an image b, an image c and an image d, splicing the image b and the image d according to a color channel to obtain an image tau, forming a data element by the image c and the image tau, wherein the image c is a natural target image, the sampling viewpoint of the image c is used as a target viewpoint, and the internal parameters of the image b, the image c and the image d are all et(t ═ 1, 2, 3, 4) in which e1Is a horizontal focal length, e2Is a vertical focal length, e3And e4Are two components of the principal point coordinates; if the last remaining image in the same image sequence is less than 3 frames, discarding; constructing a data set D by using all the sequences, wherein the data set D has f elements, and f is more than or equal to 3000 and less than or equal to 20000;
(b) constructing an ultrasound image dataset E
Sampling g ultrasonic image sequences, wherein g is more than or equal to 1 and less than or equal to 20, recording every adjacent 3 frames of images of each sequence as an image i, an image j and an image k, splicing the image i and the image k according to a color channel to obtain an image pi, forming a data element by the image j and the image pi, wherein the image j is an ultrasonic target image, and a sampling viewpoint of the image j is used as a target viewpoint;
(c) construction of a CT image dataset G
Sampling h CT image sequences, wherein h is more than or equal to 1 and less than or equal to 20, recording every adjacent 3 frames of each sequence as an image l, an image m and an image n, splicing the image l and the image n according to a color channel to obtain an image sigma, forming a data element by the image m and the image sigma, taking the image m as a CT target image, taking a sampling viewpoint of the image m as a target viewpoint, if the last residual image in the same image sequence is less than 3 frames, abandoning, and constructing a data set G by using all the sequences, wherein the data set G has xi elements, and xi is more than or equal to 1000 and less than or equal to 20000;
step 2: constructing neural networks
The resolution of the image or video processed by the neural network is p x o, p is the width, o is the height, and the resolution is 100-2000, 100-2000;
(1) structure of network A
Taking tensor H as input, the scale is alpha multiplied by o multiplied by p multiplied by 3, o is more than or equal to 100 and less than or equal to 2000, p is more than or equal to 100 and less than or equal to 2000, taking tensor I as output, the scale is alpha multiplied by o multiplied by p multiplied by 1, and alpha is the number of batches;
the network A consists of an encoder and a decoder, and for the tensor H, the output tensor I is obtained after the encoding and decoding processing is carried out in sequence;
the encoder consists of 5 residual error units, the 1 st to 5 th units respectively comprise 2, 3, 4, 6 and 3 residual error modules, each residual error module performs convolution for 3 times, the shapes of convolution kernels are 3 multiplied by 3, the number of the convolution kernels is 64, 64, 128, 256 and 512, and a maximum pooling layer is included behind the first residual error unit;
the decoder is composed of 6 decoding units, each decoding unit comprises two steps of deconvolution and convolution, the shapes and the numbers of convolution kernels of the deconvolution and convolution are the same, the shapes of convolution kernels of the 1 st to 6 th decoding units are all 3 x 3, the numbers of the convolution kernels are 512, 256, 128, 64, 32 and 16 respectively, cross-layer connection is carried out between network layers of the encoder and the decoder, and the corresponding relation of the cross-layer connection is as follows: 1 and 4, 2 and 3, 3 and 2, 4 and 1;
(2) structure of network B
Tensor J and tensor K are used as input, the scales are respectively alpha multiplied by O multiplied by p multiplied by 3 and alpha multiplied by O multiplied by p multiplied by 6, tensor L and tensor O are used as output, the scales are respectively alpha multiplied by 2 multiplied by 6 and alpha multiplied by 4 multiplied by 1, and alpha is the number of batches;
the network B is composed of a module P and a module Q, 11 layers of convolution units are shared, firstly, a tensor J and a tensor K are spliced according to a last channel to obtain a tensor with the scale of alpha multiplied by O multiplied by P multiplied by 9, and an output tensor L and a tensor O are respectively obtained after the tensor is processed by the module P and the module Q;
the module Q and the module P share a front 4-layer convolution unit, and the front 4-layer convolution unit has the structure that the convolution kernel scales in the front two-layer unit are respectively 7 multiplied by 7 and 5 multiplied by 5, the convolution kernel scales from the 3 rd layer to the 4 th layer are all 3 multiplied by 3, and the number of convolution kernels from 1 layer to 4 layers is 16, 32, 64 and 128 in sequence;
for the module P, except for sharing 4 layers, the module P occupies convolution units from the 5 th layer to the 7 th layer of the network B, the scale of convolution kernels is 3 multiplied by 3, the number of the convolution kernels is 256, after the convolution processing is carried out on the processing result of the 7 th layer by using 12 convolution kernels of 3 multiplied by 3, the 12 results are sequentially arranged into 2 rows, and the result of the tensor L is obtained;
for the module Q, except for 1 to 4 layers of the shared network B, 8 th to 11 th layers of convolution units of the network B are occupied, 2 nd layer output of the network B is used as 8 th layer input of the network B, the shapes of convolution kernels in the 8 th to 11 th layers of convolution units are all 3 multiplied by 3, the number of the convolution kernels is all 256, and after convolution processing is carried out on the 11 th layer result by using 4 convolution kernels of 3 multiplied by 3, tensor O results are obtained from 4 channels;
(3) structure of network C
Taking tensor R and tensor S as input, wherein the scales are respectively alpha multiplied by o multiplied by p multiplied by 3 and alpha multiplied by o multiplied by p multiplied by 6, taking tensor T as output, the scale is alpha multiplied by 1, and alpha is the number of batches;
the network C is used as a discriminator and is composed of 5 layers of convolution units, firstly, a tensor R and a tensor S are spliced according to a last channel to obtain a tensor with the scale of alpha multiplied by o multiplied by p multiplied by 9, and the tensor is processed by the following 5 layers of convolution units to obtain an output tensor T;
the 1 st to 5 th layers of convolution units of the network C respectively comprise 1 time of convolution processing and 1 time of activation processing, the scales of convolution kernels are all 3 multiplied by 3, the number of the convolution kernels in the 1 st to 5 th layers of convolution units is respectively 64, 128, 256, 512 and 1, and tensor T is obtained from the 5 th layer of results;
and step 3: training of neural networks
Respectively dividing samples in a data set D, a data set E and a data set G into a training set and a testing set according to a ratio of 9:1, wherein data in the training set is used for training, data in the testing set is used for testing, training data are respectively obtained from the corresponding data sets when the following steps are trained, the training data are uniformly scaled to a resolution ratio p x o and input into a corresponding network, iterative optimization is carried out, and loss of each batch is minimized by continuously modifying network model parameters;
in the training process, the calculation method of each loss comprises the following steps:
internal parameter supervision synthesis loss: in the network model parameter training of the natural image, the output tensor I of the network A is taken as the depth, and the output result L of the network B and the internal parameter label e of the training data are taken as the deptht(t is 1, 2, 3, 4) respectively used as a pose parameter and a camera internal parameter, respectively synthesizing two images at the viewpoint of the image c by using the image b and the image d according to a computer vision algorithm, and respectively calculating by using the image c and the two images according to the sum of the intensity difference of pixel-by-pixel and color-by-color channels;
unsupervised synthesis loss: in the network model parameter training of ultrasonic or CT images, the output tensor I of the network A is used as depth, the output tensor L and the tensor O of the network B are respectively used as pose parameters and camera internal parameters, images at the view point of a target image are respectively synthesized by using two adjacent images of the target image according to a computer vision algorithm, and the target image and the image at the view point of the target image are respectively calculated according to the sum of the intensity differences of pixel-by-pixel and color-by-color channels;
internal parameter error loss: utilizing output result O of network B and internal parameter label e of training datat(t is 1, 2, 3, 4) calculated as the sum of the absolute values of the differences of the components;
spatial structure error loss: in the network model parameter training of ultrasonic or CT images, the output tensor I of a network A is used as depth, the output tensor L and the output tensor O of a network B are respectively used as pose parameters and camera internal parameters, a target image is reconstructed by taking the viewpoint of the target image as the origin of a camera coordinate system according to a computer vision algorithm, a RANSAC algorithm is adopted to fit a spatial structure of reconstruction points, and Euclidean distance from each reconstruction point of the target image to the spatial geometric structure is calculated;
(1) on the data set D, the modules P of the network A and the network B are respectively trained 80000 times
Taking out training data from the data set D each time, uniformly scaling to a resolution ratio P x o, inputting the image c into the network A, inputting the image c and the image r into the network B, training the module P of the network B, and calculating the training loss of each batch by the supervision and synthesis loss of internal parameters;
(2) on data set D, model Q of network B was trained 80000 times
Taking out training data from the data set D each time, uniformly scaling to a resolution ratio p x o, inputting the image c into the network A, inputting the image c and the image t into the network B, and training the module Q of the network B, wherein the training loss of each batch is obtained by calculating the sum of the supervised synthesis loss of internal parameters and the error loss of the internal parameters;
(3) on the data set E, the module Q of the network B is trained 80000 times according to the following steps
Taking out ultrasonic training data from a data set E every time, uniformly scaling the ultrasonic training data to a resolution ratio p x o, inputting an image j into a network A, inputting the image j and the image pi into a network B, and training a module Q of the network B, wherein the training loss of each batch is calculated by the sum of unsupervised synthesis loss and spatial structure error loss;
(4) on a data set E, training the network C80000 times to obtain a model parameter rho
Taking out ultrasonic training data from a data set E every time, uniformly scaling the ultrasonic training data to a resolution ratio p x o, inputting an image j into a network A, inputting an image j and an image pi into a network B, then taking the output of the network A as a depth, taking the output of the network B as a pose parameter and an internal parameter of a camera, respectively synthesizing two images at a visual point of the image j according to an image i and an image k, splicing the two images according to a color channel, inputting the image j and the spliced image into a network C, and obtaining an optimal network model parameter p after iteration by continuously modifying the parameter of the network C during training of the network C; the method for calculating the loss psi of each batch comprises the following steps:
Ψ=δ[log(T)]+δ[log(1-T)]
(1)
wherein δ is the true-false distribution function, log (T) is the natural logarithm taken of T, and T is the output of the network C;
(5) training the network C80000 times on a data set G to obtain model parameters rho'
Taking out CT training data from a data set G each time, uniformly scaling to a resolution ratio p x o, inputting an image m into a network A, inputting the image m and an image sigma into a network B, then taking the output of the network A as a depth, taking the output of the network B as a pose parameter and an internal parameter of a camera, respectively synthesizing two images at a viewpoint of the image m according to an image l and an image n, splicing the two images according to a color channel, inputting the image m and the spliced image into a network C, and obtaining an optimal network model parameter rho' after iteration by continuously modifying the parameter of the network C during training of the network C; calculating the loss of each batch as the sum of unsupervised synthesis loss, spatial structure error loss and translational motion loss Y of the camera, wherein Y is obtained by the output pose parameter of the network B and the constraint calculation of the translational motion of the camera;
and 4, step 4: three-dimensional reconstruction of ultrasound or CT images
Using a self-sampled ultrasonic or CT sequence image, uniformly scaling each frame of image to resolution p x o, using model parameter p or model parameter p' to predict, inputting image j to network A, inputting image j and image pi to network B for the ultrasonic sequence image, inputting image m to network A, inputting image m and image sigma to network B, using the output of network A as depth, using the output of network B as pose parameter and camera internal parameter, selecting key frames according to the following steps, using the first frame in the sequence as current key frame, using each frame in the sequence image as target frame, synthesizing the image at the viewpoint of the target frame according to the current key frame, using camera pose parameter and internal parameter, calculating error lambda by using the sum of pixel-by-pixel color channel intensity difference between the synthesized image and the target frame, synthesizing an image at a viewpoint of a target frame according to adjacent frames of the target frame by using pose parameters and internal parameters of a camera, calculating an error gamma by using the sum of the intensity differences of pixel-by-pixel and color-by-color channels between the synthesized image and the target frame, further calculating a synthesis error ratio Z by using a formula (2), and updating the current key frame into the current target frame when the Z is greater than a threshold eta, wherein the ratio of 1< eta < 2;
Figure BDA0002874353700000051
for any target frame, the resolution is scaled to M multiplied by N, the three-dimensional coordinates in the camera coordinate system of each pixel of each frame of image are calculated according to the internal parameters of the camera and the reconstruction algorithm of computer vision, furthermore, the viewpoint of the first frame is used as the origin of the world coordinate system, and the three-dimensional coordinates in the world coordinate system of each pixel of each frame of image of the sequence are calculated by utilizing the geometric transformation of three-dimensional space and combining the pose parameters of all key frames.
The method can effectively realize the three-dimensional reconstruction of the ultrasonic or CT image, and can show the slice image with a 3D visual effect in the auxiliary diagnosis of artificial intelligence, thereby improving the auxiliary diagnosis efficiency.
Drawings
FIG. 1 is a three-dimensional reconstruction result of an ultrasound image of the present invention;
FIG. 2 is a three-dimensional reconstruction result of the CT image of the present invention.
Detailed Description
Examples
The invention is further described below with reference to the accompanying drawings.
The embodiment is implemented under a Windows 1064-bit operating system on a PC, and the hardware configuration of the embodiment is CPU i7-9700F, a memory 16G and GPU NVIDIA GeForce GTX 20708G. The deep learning library used Tensorflow1.14. The programming is in Python language.
A three-dimensional reconstruction method of ultrasonic or CT medical images is disclosed, wherein an ultrasonic or CT image sequence is input, the resolution ratio is M multiplied by N, for an ultrasonic image, M is 450, N is 300, for a CT image, M and N are both 512, and the three-dimensional reconstruction process specifically comprises the following steps:
step 1: building a data set
(a) Constructing a natural image dataset D
Selecting a natural image website, requiring image sequences and corresponding internal parameters of a camera, downloading 19 image sequences and the corresponding internal parameters of the sequences from the natural image website, recording every adjacent 3 frames of images as an image b, an image c and an image d for each image sequence, splicing the image b and the image d according to a color channel to obtain an image tau, forming a data element by the image c and the image tau, wherein the image c is a natural target image, the sampling viewpoint of the image c is a target viewpoint, and the internal parameters of the image b, the image c and the image d are all et(t-1, 2, 3, 4) wherein e1Is the horizontal focal length, e2Is a vertical focal length, e3And e4Are two components of the principal point coordinates; if the last remaining image in the same image sequence is less than 3 frames, discarding; constructing a data set D by using all the sequences, wherein the data set D has 3600 elements;
(b) constructing an ultrasound image dataset E
Sampling 10 ultrasonic image sequences, recording 3 adjacent frames of images as an image i, an image j and an image k for each sequence, splicing the image i and the image k according to a color channel to obtain an image pi, forming a data element by the image j and the image pi, wherein the image j is an ultrasonic target image, a sampling viewpoint of the image j is used as a target viewpoint, if the last remaining images in the same image sequence are less than 3 frames, discarding the images, and constructing a data set E by using all the sequences, wherein the data set E has 1600 elements;
(c) construction of a CT image dataset G
Sampling 1 CT image sequence, regarding the sequence, marking every adjacent 3 frames as an image l, an image m and an image n, splicing the image l and the image n according to a color channel to obtain an image sigma, forming a data element by the image m and the image sigma, wherein the image m is a CT target image, a sampling viewpoint of the image m is used as a target viewpoint, if the last residual image in the same image sequence is less than 3 frames, discarding, and constructing a data set G by using all the sequences, wherein the data set G has 2000 elements; step 2: constructing neural networks
The resolution of the image or video processed by the neural network is 416 × 128, 416 is the width, 128 is the height, and the pixel is taken as the unit;
(1) structure of network A
Taking tensor H as input, the scale is 16 multiplied by 128 multiplied by 416 multiplied by 3, taking tensor I as output, and the scale is 16 multiplied by 128 multiplied by 416 multiplied by 1;
the network A consists of an encoder and a decoder, and for the tensor H, the output tensor I is obtained after the encoding and decoding processing is carried out in sequence;
the encoder consists of 5 residual error units, the 1 st to 5 th units respectively comprise 2, 3, 4, 6 and 3 residual error modules, each residual error module performs convolution for 3 times, for the 5 residual error units, the number of convolution kernels is respectively 64, 64, 128, 256 and 512, wherein a maximum pooling layer is included behind the first residual error unit;
the decoder is composed of 6 decoding units, each decoding unit comprises two steps of deconvolution and convolution, the shapes and the numbers of convolution kernels of the deconvolution and convolution are the same, the shapes of convolution kernels of the 1 st to 6 th decoding units are all 3 x 3, the numbers of the convolution kernels are 512, 256, 128, 64, 32 and 16 respectively, cross-layer connection is carried out between network layers of the encoder and the decoder, and the corresponding relation of the cross-layer connection is as follows: 1 and 4, 2 and 3, 3 and 2, 4 and 1;
(2) structure of network B
Tensor J and tensor K are used as inputs, the scales are respectively 16 × 128 × 416 × 3 and 16 × 128 × 416 × 6, tensor L and tensor O are used as outputs, and the scales are respectively 16 × 2 × 6 and 16 × 4 × 1;
the network B is composed of a module P and a module Q, 11 layers of convolution units are shared, firstly, a tensor J and a tensor K are spliced according to a last channel to obtain a tensor with the dimension of 16 multiplied by 128 multiplied by 416 multiplied by 9, and an output tensor L and a tensor O are respectively obtained after the tensor is processed by the module P and the module Q;
the module Q and the module P share a front 4-layer convolution unit, wherein the front 4-layer convolution unit has the structure that the convolution kernel scales in the front two-layer convolution unit are respectively 7 multiplied by 7 and 5 multiplied by 5, the convolution kernel scales from the 3 rd layer to the 4 th layer are all 3 multiplied by 3, and the number of the convolution kernels from 1 layer to 4 layers is 16, 32, 64 and 128 in sequence;
for the module P, except for sharing 4 layers, the module P occupies convolution units from the 5 th layer to the 7 th layer of the network B, the scale of convolution kernels is 3 multiplied by 3, the number of the convolution kernels is 256, after the convolution processing is carried out on the processing result of the 7 th layer by using 12 convolution kernels of 3 multiplied by 3, the 12 results are sequentially arranged into 2 rows, and the result of the tensor L is obtained;
for the module Q, except for 1 to 4 layers of the shared network B, 8 th to 11 th layers of convolution units of the network B are occupied, 2 nd layer output of the network B is used as 8 th layer input of the network B, the shapes of convolution kernels in the 8 th to 11 th layers of convolution units are all 3 multiplied by 3, the number of the convolution kernels is all 256, and after convolution processing is carried out on the 11 th layer result by using 4 convolution kernels of 3 multiplied by 3, tensor O results are obtained from 4 channels;
(3) structure of network C
Tensor R and tensor S are used as input, the scales are respectively 16 × 128 × 416 × 3 and 16 × 128 × 416 × 6, tensor T is used as output, and the scale is 16 × 1;
the network C is used as a discriminator and is composed of 5 layers of convolution units, firstly, a tensor R and a tensor S are spliced according to a last channel to obtain a tensor with the dimension of 16 multiplied by 128 multiplied by 416 multiplied by 9, and the tensor is processed by the following 5 layers of convolution units to obtain an output tensor T;
the 1 st to 5 th layers of convolution units of the network C respectively comprise 1 time of convolution processing and 1 time of activation processing, the scales of convolution kernels are all 3 multiplied by 3, the number of the convolution kernels in the 1 st to 5 th layers of convolution units is respectively 64, 128, 256, 512 and 1, and tensor T is obtained from the 5 th layer of results;
and step 3: training of neural networks
Respectively dividing samples in a data set D, a data set E and a data set G into a training set and a test set according to a ratio of 9:1, wherein data in the training set is used for training, data in the test set is used for testing, training data are respectively obtained from the corresponding data sets during training in the following steps, the training data are uniformly scaled to have a resolution of 416 multiplied by 128 and input into a corresponding network, iterative optimization is carried out, and loss of each batch is minimized by continuously modifying parameters of a network model;
in the training process, the calculation method of each loss is as follows:
internal parameter supervision synthesis loss: in the network model parameter training of the natural image, the output tensor I of the network A is taken as the depth, and the output result L of the network B and the internal parameter label e of the training data are taken as the deptht(t is 1, 2, 3, 4) respectively used as pose parameters and camera internal parameters, respectively synthesizing two images at the viewpoint of the image c by using the image b and the image d according to a computer vision algorithm, and respectively calculating by using the image c and the two images according to the sum of the intensity differences of pixel-by-pixel and color-by-color channels;
unsupervised synthesis loss: in the network model parameter training of ultrasonic or CT images, the output tensor I of a network A is used as depth, the output tensor L and the tensor O of a network B are respectively used as pose parameters and camera internal parameters, images at the viewpoint of a target image are respectively synthesized by using two adjacent images of the target image according to a computer vision algorithm, and the target image and the images at the viewpoint of the target image are respectively obtained by calculation according to the sum of the intensity differences of pixel-by-pixel and color-by-color channels;
internal parameter error loss: utilizing output result O of network B and internal parameter label e of training datat(t is 1, 2, 3, 4) calculated as the sum of the absolute values of the differences of the components;
spatial structure error loss: in the network model parameter training of an ultrasonic or CT image, the output tensor I of the network A is used as depth, the output tensor L and the tensor O of the network B are respectively used as pose parameters and camera internal parameters, the target image is reconstructed by taking the viewpoint of the target image as the origin of a camera coordinate system according to a computer vision algorithm, a RANSAC algorithm is adopted to fit a spatial structure to reconstruction points, and the Euclidean distance from each reconstruction point of the target image to a spatial geometric structure is calculated;
(1) on the data set D, the modules P of the network A and the network B are respectively trained 80000 times
Taking out training data from the data set D each time, uniformly scaling the training data to a resolution of 416 multiplied by 128, inputting the image c into the network A, inputting the image c and the image tau into the network B, and training the module P of the network B, wherein the training loss of each batch is obtained by calculating the internal parameter supervision synthesis loss;
(2) on data set D, model Q of network B was trained 80000 times
Taking out training data from the data set D each time, uniformly scaling the training data to a resolution of 416 multiplied by 128, inputting the image c into the network A, inputting the image c and the image tau into the network B, and training the module Q of the network B, wherein the training loss of each batch is calculated by the sum of the supervised synthesis loss of internal parameters and the error loss of the internal parameters;
(3) on the data set E, the module Q of the network B is trained 80000 times according to the following steps
Taking out ultrasonic training data from a data set E every time, uniformly scaling the ultrasonic training data to a resolution of 416 x 128, inputting an image j into a network A, inputting the image j and the image pi into a network B, and training a module Q of the network B, wherein the training loss of each batch is calculated by the sum of unsupervised synthesis loss and spatial structure error loss;
(4) on a data set E, training the network C80000 times to obtain a model parameter rho
Taking out ultrasonic training data from a data set E every time, uniformly scaling the ultrasonic training data to a resolution of 416 x 128, inputting an image j into a network A, inputting an image j and an image pi into a network B, then taking the output of the network A as a depth, taking the output of the network B as a pose parameter and an internal parameter of a camera, respectively synthesizing two images at a viewpoint of the image j according to an image i and an image k, splicing the two images according to a color channel, inputting the image j and the spliced image into a network C, and obtaining an optimal network model parameter rho after iteration by continuously modifying the parameter of the network C during training of the network C; the method for calculating the loss psi of each batch comprises the following steps:
Ψ=δ[log(T)]+δ[log(1-T)]
(1)
wherein δ is the true-false distribution function, log (T) is the natural logarithm taken of T, and T is the output of the network C;
(5) training the network C80000 times on a data set G to obtain model parameters rho'
Taking out CT training data from a data set G each time, uniformly zooming to a resolution of 416 x 128, inputting an image m into a network A, inputting the image m and an image sigma into a network B, then taking the output of the network A as a depth, taking the output of the network B as a pose parameter and a camera internal parameter, respectively synthesizing two images at a viewpoint of the image m according to an image l and an image n, splicing the two images according to a color channel, inputting the image m and the spliced image into a network C, and obtaining an optimal network model parameter rho' after iteration by continuously modifying the parameter of the network C during training of the network C; calculating the loss of each batch as the sum of unsupervised synthesis loss, spatial structure error loss and translational motion loss Y of the camera, wherein Y is calculated by the output pose parameter of the network B according to the constraint of the translational motion of the camera;
and 4, step 4: three-dimensional reconstruction of ultrasound or CT images
Using a self-sampled ultrasonic or CT sequence image, uniformly scaling each frame of image to resolution 416 x 128, predicting by using model parameter p or model parameter p', inputting image j to network A, inputting image j and image pi to network B for the ultrasonic sequence image, inputting image m to network A, inputting image m and image sigma to network B, using the output of network A as depth, using the output of network B as pose parameter and camera internal parameter, selecting key frames according to the following steps, using the first frame in the sequence as current key frame, using each frame in the sequence image as target frame, synthesizing the image at the viewpoint of the target frame according to the current key frame by using camera pose parameter and internal parameter, calculating error lambda by using the sum of pixel-by-pixel color channel intensity difference between the synthesized image and the target frame, synthesizing images at the viewpoint of the target frame according to adjacent frames of the target frame by using the pose parameters and the internal parameters of the camera, calculating an error gamma by using the sum of the intensity differences of pixel-by-pixel and color-by-color channels between the synthesized images and the target frame, further calculating a synthesis error ratio Z by using a formula (2), and updating the current key frame into the current target frame when the Z is greater than a threshold value 1.2;
Figure BDA0002874353700000101
for any target frame, the resolution ratio is scaled to M multiplied by N, for ultrasonic images, M is 450, N is 300, for CT images, M and N are 512, according to the internal parameters of a camera, the three-dimensional coordinates in the camera coordinate system of each pixel of each frame of image are calculated according to the reconstruction algorithm of computer vision, further, the viewpoint of the first frame is used as the origin of the world coordinate system, and the three-dimensional coordinates in the world coordinate system of each pixel of each frame of image of the sequence are calculated by combining the pose parameters of all key frames and utilizing three-dimensional space geometric transformation.
In the examples, the experimental hyper-parameters are as follows: the optimizer adopts an Adam optimizer, the learning rate of each network is 0.0002, and the momentum coefficient is 0.9.
In this embodiment, a test is performed on a data set E, 7 samples are selected for an experiment, synthesis errors of different samples under the constraint of a space geometric condition are counted, and the synthesis errors are calculated by using a formula (1), where the obtained synthesis errors of 7 sequences are respectively: 0.291, 0.221, 0.172, 0.149, 0.221, 0.131, 0.307, in the experiment, in order to be able to visualize the results of the three-dimensional reconstruction, the ultrasound image was segmented using DenseNet to generate 3D reconstruction results, fig. 1 shows the results of the three-dimensional reconstruction of the ultrasound image, from which the effectiveness of the method can be seen.
Testing is carried out on a data set G, sequence images formed by 400 sampled images are divided into 10 groups for experiment, synthesis errors of different samples under the constraint of space geometric conditions are counted, the synthesis errors are calculated by using a formula (1), and the obtained synthesis errors of the 10 sequences are respectively: 0.157, 0.191, 0.196, 0.197, 0.217, 0.203, 0.243, 0.301, 0.246, in order to enable visualization of the three-dimensional reconstruction results, the CT images were segmented using DenseNet to generate 3D reconstruction results, and fig. 2 shows the three-dimensional reconstruction results of the CT images, from which the effectiveness of the method of the present invention for CT reconstruction can be seen.

Claims (1)

1. A three-dimensional reconstruction method of ultrasonic or CT medical images is characterized in that the method inputs an ultrasonic or CT image sequence, the image resolution is MxN, M is more than or equal to 100 and less than or equal to 1500, N is more than or equal to 100 and less than or equal to 1500, and the three-dimensional reconstruction process specifically comprises the following steps:
step 1: building a data set
(a) Constructing a natural image dataset D
Selecting a natural image website, requiring image sequences and corresponding camera internal parameters, downloading a image sequences and internal parameters corresponding to the sequences from the natural image website, wherein a is more than or equal to 1 and less than or equal to 20, recording 3 adjacent frames of images as an image b, an image c and an image d for each image sequence, splicing the image b and the image d according to a color channel to obtain an image tau, forming a data element by the image c and the image tau, wherein the image c is a natural target image, the sampling viewpoint of the image c is used as a target viewpoint, and the internal parameters of the image b, the image c and the image d are all et(t ═ 1, 2, 3, 4) in which e1Is a horizontal focal length, e2Is a vertical focal length, e3And e4Are two components of the principal point coordinates; if the last residual image in the same image sequence is less than 3 frames, discarding; constructing a data set D by using all the sequences, wherein the data set D has f elements, and f is more than or equal to 3000 and less than or equal to 20000;
(b) constructing an ultrasound image dataset E
Sampling g ultrasonic image sequences, wherein g is more than or equal to 1 and less than or equal to 20, recording every adjacent 3 frames of images of each sequence as an image i, an image j and an image k, splicing the image i and the image k according to a color channel to obtain an image pi, forming a data element by the image j and the image pi, wherein the image j is an ultrasonic target image, and a sampling viewpoint of the image j is used as a target viewpoint;
(c) construction of a CT image dataset G
Sampling h CT image sequences, wherein h is more than or equal to 1 and less than or equal to 20, recording every adjacent 3 frames of each sequence as an image l, an image m and an image n, splicing the image l and the image n according to a color channel to obtain an image sigma, forming a data element by the image m and the image sigma, taking the image m as a CT target image, taking a sampling viewpoint of the image m as a target viewpoint, if the last residual image in the same image sequence is less than 3 frames, abandoning, and constructing a data set G by using all the sequences, wherein the data set G has xi elements, and xi is more than or equal to 1000 and less than or equal to 20000; step 2: constructing neural networks
The resolution of the image or video processed by the neural network is p x o, p is the width, o is the height, and the resolution is 100-2000, 100-2000;
(1) structure of network A
Taking tensor H as input, the scale is alpha multiplied by o multiplied by p multiplied by 3, taking tensor I as output, the scale is alpha multiplied by o multiplied by p multiplied by 1, and alpha is the number of batches;
the network A consists of an encoder and a decoder, and for the tensor H, the output tensor I is obtained after the encoding and decoding processing is carried out in sequence;
the encoder consists of 5 residual error units, the 1 st to 5 th units respectively comprise 2, 3, 4, 6 and 3 residual error modules, each residual error module performs convolution for 3 times, the shapes of convolution kernels are 3 multiplied by 3, the number of the convolution kernels is 64, 64, 128, 256 and 512, and a maximum pooling layer is included behind the first residual error unit;
the decoder is composed of 6 decoding units, each decoding unit comprises two steps of deconvolution and convolution, the shapes and the numbers of convolution kernels of the deconvolution and convolution are the same, the shapes of convolution kernels of the 1 st to 6 th decoding units are all 3 x 3, the numbers of the convolution kernels are 512, 256, 128, 64, 32 and 16 respectively, cross-layer connection is carried out between network layers of the encoder and the decoder, and the corresponding relation of the cross-layer connection is as follows: 1 and 4, 2 and 3, 3 and 2, 4 and 1;
(2) structure of network B
Tensor J and tensor K are used as input, the scales are respectively alpha multiplied by O multiplied by p multiplied by 3 and alpha multiplied by O multiplied by p multiplied by 6, tensor L and tensor O are used as output, the scales are respectively alpha multiplied by 2 multiplied by 6 and alpha multiplied by 4 multiplied by 1, and alpha is the number of batches;
the network B is composed of a module P and a module Q, 11 layers of convolution units are shared, firstly, a tensor J and a tensor K are spliced according to a last channel to obtain a tensor with the scale of alpha multiplied by O multiplied by P multiplied by 9, and an output tensor L and a tensor O are respectively obtained after the tensor is processed by the module P and the module Q;
the module Q and the module P share the first 4 layers of convolution units, and the structure of the first 4 layers of convolution units is as follows: the convolution kernel scales in the first two layers of units are respectively 7 multiplied by 7 and 5 multiplied by 5, the convolution kernel scales from the 3 rd layer to the 4 th layer are all 3 multiplied by 3, and the number of the convolution kernels from 1 layer to 4 layers is 16, 32, 64 and 128 in sequence;
for the module P, except for sharing 4 layers, the module P occupies convolution units from the 5 th layer to the 7 th layer of the network B, the scale of convolution kernels is 3 multiplied by 3, the number of the convolution kernels is 256, after the convolution processing is carried out on the processing result of the 7 th layer by using 12 convolution kernels of 3 multiplied by 3, the 12 results are sequentially arranged into 2 rows, and the result of the tensor L is obtained;
for the module Q, except for 1 to 4 layers of the shared network B, 8 th to 11 th layers of convolution units of the network B are occupied, 2 nd layer output of the network B is used as 8 th layer input of the network B, the shapes of convolution kernels in the 8 th to 11 th layers of convolution units are all 3 multiplied by 3, the number of the convolution kernels is all 256, and after convolution processing is carried out on the 11 th layer result by using 4 convolution kernels of 3 multiplied by 3, tensor O results are obtained from 4 channels;
(3) structure of network C
Taking tensor R and tensor S as input, wherein the scales are respectively alpha multiplied by o multiplied by p multiplied by 3 and alpha multiplied by o multiplied by p multiplied by 6, taking tensor T as output, the scale is alpha multiplied by 1, and alpha is the number of batches;
the network C is used as a discriminator and is composed of 5 layers of convolution units, firstly, a tensor R and a tensor S are spliced according to a last channel to obtain a tensor with the scale of alpha multiplied by o multiplied by p multiplied by 9, and the tensor is processed by the following 5 layers of convolution units to obtain an output tensor T;
the 1 st to 5 th layers of convolution units of the network C respectively comprise 1 time of convolution processing and 1 time of activation processing, the scales of convolution kernels are all 3 multiplied by 3, the numbers of the convolution kernels in the 1 st to 5 th layers of convolution units are respectively 64, 128, 256, 512 and 1, and tensor T is obtained from the 5 th layer of results;
and step 3: training of neural networks
Respectively dividing samples in a data set D, a data set E and a data set G into a training set and a testing set according to a ratio of 9:1, wherein data in the training set is used for training, data in the testing set is used for testing, training data are respectively obtained from the corresponding data sets when the following steps are trained, the training data are uniformly scaled to a resolution ratio p x o and input into a corresponding network, iterative optimization is carried out, and loss of each batch is minimized by continuously modifying network model parameters;
in the training process, the calculation method of each loss is as follows:
internal parameter supervision synthesis loss: in the network model parameter training of the natural image, the output tensor I of the network A is taken as the depth, and the output result L of the network B and the internal parameter label e of the training data are taken as the deptht(t is 1, 2, 3, 4) respectively used as a pose parameter and a camera internal parameter, respectively synthesizing two images at the viewpoint of the image c by using the image b and the image d according to a computer vision algorithm, and respectively calculating by using the image c and the two images according to the sum of the intensity difference of pixel-by-pixel and color-by-color channels;
unsupervised synthesis loss: in the network model parameter training of ultrasonic or CT images, the output tensor I of a network A is used as depth, the output tensor L and the tensor O of a network B are respectively used as pose parameters and camera internal parameters, images at the viewpoint of a target image are respectively synthesized by using two adjacent images of the target image according to a computer vision algorithm, and the target image and the images at the viewpoint of the target image are respectively obtained by calculation according to the sum of the intensity differences of pixel-by-pixel and color-by-color channels;
internal parameter error loss: utilizing output result O of network B and internal parameter label e of training datat(t is 1, 2, 3, 4) calculated as the sum of the absolute values of the differences of the components;
spatial structure error loss: in the network model parameter training of ultrasonic or CT images, the output tensor I of a network A is used as depth, the output tensor L and the output tensor O of a network B are respectively used as pose parameters and camera internal parameters, a target image is reconstructed by taking the viewpoint of the target image as the origin of a camera coordinate system according to a computer vision algorithm, a RANSAC algorithm is adopted to fit a spatial structure of reconstruction points, and Euclidean distance from each reconstruction point of the target image to the spatial geometric structure is calculated;
(1) on the data set D, the modules P of the network A and the network B are respectively trained 80000 times
Taking out training data from the data set D each time, uniformly scaling to a resolution P x o, inputting the image c into the network A, inputting the image c and the image tau into the network B, training the module P of the network B, and calculating the training loss of each batch by monitoring and synthesizing loss of internal parameters;
(2) on data set D, model Q of network B was trained 80000 times
Taking out training data from the data set D each time, uniformly scaling to a resolution ratio p x o, inputting the image c into the network A, inputting the image c and the image t into the network B, and training the module Q of the network B, wherein the training loss of each batch is obtained by calculating the sum of the supervised synthesis loss of internal parameters and the error loss of the internal parameters;
(3) on the data set E, the module Q of the network B is trained 80000 times according to the following steps
Taking out ultrasonic training data from a data set E every time, uniformly scaling to a resolution p x o, inputting an image j into a network A, inputting the image j and the image pi into a network B, training a module Q of the network B, and calculating the training loss of each batch by the sum of unsupervised synthesis loss and spatial structure error loss;
(4) training the network C80000 times on a data set E to obtain a model parameter rho
Taking out ultrasonic training data from a data set E every time, uniformly scaling the ultrasonic training data to a resolution ratio p x o, inputting an image j into a network A, inputting an image j and an image pi into a network B, then taking the output of the network A as a depth, taking the output of the network B as a pose parameter and an internal parameter of a camera, respectively synthesizing two images at a visual point of the image j according to an image i and an image k, splicing the two images according to a color channel, inputting the image j and the spliced image into a network C, and obtaining an optimal network model parameter p after iteration by continuously modifying the parameter of the network C during training of the network C; the method for calculating the loss psi of each batch comprises the following steps:
Ψ=δ[log(T)]+δ[log(1-T)] (1)
wherein δ is the true-false distribution function, log (T) is the natural logarithm taken of T, and T is the output of the network C;
(5) training the network C80000 times on a data set G to obtain model parameters rho'
Taking out CT training data from a data set G each time, uniformly scaling to a resolution ratio p x o, inputting an image m into a network A, inputting the image m and an image sigma into a network B, then taking the output of the network A as a depth, taking the output of the network B as a pose parameter and an internal parameter of a camera, respectively synthesizing two images at a viewpoint of the image m according to an image l and an image n, splicing the two images according to a color channel, inputting the image m and the spliced image into a network C, and obtaining an optimal network model parameter rho' after iteration by continuously modifying the parameter of the network C during training of the network C; the loss of each batch is the sum of unsupervised synthesis loss, spatial structure error loss and translational motion loss Y of the camera, wherein Y is obtained by output pose parameters of the network B and constraint calculation of the translational motion of the camera;
and 4, step 4: three-dimensional reconstruction of ultrasound or CT images
Using a self-sampled ultrasonic or CT sequence image, uniformly scaling each frame of image to resolution p x o, using model parameter p or model parameter p' to predict, inputting image j to network A, inputting image j and image pi to network B for the ultrasonic sequence image, inputting image m to network A, inputting image m and image sigma to network B, using the output of network A as depth, using the output of network B as pose parameter and camera internal parameter, selecting key frames according to the following steps, using the first frame in the sequence as current key frame, using each frame in the sequence image as target frame, synthesizing the image at the viewpoint of the target frame according to the current key frame, using camera pose parameter and internal parameter, calculating error lambda by using the sum of pixel-by-pixel color channel intensity difference between the synthesized image and the target frame, synthesizing an image at a viewpoint of a target frame by using pose parameters and internal parameters of a camera according to adjacent frames of the target frame, calculating an error gamma by using the sum of pixel-by-pixel and color-by-color channel intensity differences between the synthesized image and the target frame, further calculating a synthesis error ratio Z by using a formula (2), and updating the current key frame into the current target frame when the Z is greater than a threshold eta and the ratio is more than 1 and less than 2;
Figure FDA0002874353690000051
and further, calculating to obtain the three-dimensional coordinates in the world coordinate system of each pixel of each frame of image of the sequence by using the geometric transformation of three-dimensional space and by taking the viewpoint of the first frame as the origin of the world coordinate system and combining the pose parameters of all key frames according to the reconstruction algorithm of computer vision.
CN202011623243.2A 2020-12-30 2020-12-30 Ultrasonic or CT medical image three-dimensional reconstruction method Active CN112734907B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011623243.2A CN112734907B (en) 2020-12-30 2020-12-30 Ultrasonic or CT medical image three-dimensional reconstruction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011623243.2A CN112734907B (en) 2020-12-30 2020-12-30 Ultrasonic or CT medical image three-dimensional reconstruction method

Publications (2)

Publication Number Publication Date
CN112734907A CN112734907A (en) 2021-04-30
CN112734907B true CN112734907B (en) 2022-07-08

Family

ID=75609341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011623243.2A Active CN112734907B (en) 2020-12-30 2020-12-30 Ultrasonic or CT medical image three-dimensional reconstruction method

Country Status (1)

Country Link
CN (1) CN112734907B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113838210A (en) * 2021-09-10 2021-12-24 西北工业大学 Method and device for converting ultrasonic image into 3D model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017223560A1 (en) * 2016-06-24 2017-12-28 Rensselaer Polytechnic Institute Tomographic image reconstruction via machine learning
CN107563983A (en) * 2017-09-28 2018-01-09 上海联影医疗科技有限公司 Image processing method and medical imaging devices
CN109003325A (en) * 2018-06-01 2018-12-14 网易(杭州)网络有限公司 A kind of method of three-dimensional reconstruction, medium, device and calculate equipment
CN109745062A (en) * 2019-01-30 2019-05-14 腾讯科技(深圳)有限公司 Generation method, device, equipment and the storage medium of CT image
CN111310736A (en) * 2020-03-26 2020-06-19 上海同岩土木工程科技股份有限公司 Rapid identification method for unloading and piling of vehicles in protected area

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7134687B2 (en) * 2018-04-20 2022-09-12 キヤノンメディカルシステムズ株式会社 X-ray diagnostic device, medical image processing device, and medical image diagnostic device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017223560A1 (en) * 2016-06-24 2017-12-28 Rensselaer Polytechnic Institute Tomographic image reconstruction via machine learning
CN107563983A (en) * 2017-09-28 2018-01-09 上海联影医疗科技有限公司 Image processing method and medical imaging devices
CN109003325A (en) * 2018-06-01 2018-12-14 网易(杭州)网络有限公司 A kind of method of three-dimensional reconstruction, medium, device and calculate equipment
CN109745062A (en) * 2019-01-30 2019-05-14 腾讯科技(深圳)有限公司 Generation method, device, equipment and the storage medium of CT image
CN111310736A (en) * 2020-03-26 2020-06-19 上海同岩土木工程科技股份有限公司 Rapid identification method for unloading and piling of vehicles in protected area

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
3D Deep Learning on Medical Images: A Review;Satya P. Singh,et al;《sensors》;20200907;正文第1-24页 *
Isotropic Reconstruction of 3D EM Images with Unsupervised Degradation Learning;Shiyu Deng,et al;《Springer Nature Switzerland AG 2020》;20200929;正文第163-173页 *
基于深度学习的计算机断层扫描重建方法研究;郭恒;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20200615;I138-884 *
心脏血管影像形态及功能智能化分析;曹艳坤;《中国优秀硕士学位论文全文数据库 (医药卫生科技辑)》;20190915;E062-51 *

Also Published As

Publication number Publication date
CN112734907A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
CN112767532B (en) Ultrasonic or CT medical image three-dimensional reconstruction method based on transfer learning
Fu et al. Bidirectional 3D quasi-recurrent neural network for hyperspectral image super-resolution
CN112308860A (en) Earth observation image semantic segmentation method based on self-supervision learning
CN112819876B (en) Monocular vision depth estimation method based on deep learning
CN113689545B (en) 2D-to-3D end-to-end ultrasound or CT medical image cross-modal reconstruction method
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN112734907B (en) Ultrasonic or CT medical image three-dimensional reconstruction method
CN112906813A (en) Flotation condition identification method based on density clustering and capsule neural network
Wang et al. Cascaded attention guidance network for single rainy image restoration
CN112700535B (en) Ultrasonic image three-dimensional reconstruction method for intelligent medical auxiliary diagnosis
Gao et al. Bayesian image super-resolution with deep modeling of image statistics
CN112734906B (en) Three-dimensional reconstruction method of ultrasonic or CT medical image based on knowledge distillation
CN112700534B (en) Ultrasonic or CT medical image three-dimensional reconstruction method based on feature migration
CN115861384B (en) Optical flow estimation method and system based on countermeasure and attention mechanism generation
CN118015396A (en) Unsupervised medical image organ segmentation model-based pre-training method
CN113689542B (en) Ultrasonic or CT medical image three-dimensional reconstruction method based on self-attention transducer
CN113689548B (en) Medical image three-dimensional reconstruction method based on mutual attention transducer
Liu et al. Diverse hyperspectral remote sensing image synthesis with diffusion models
CN113689544B (en) Cross-view geometric constraint medical image three-dimensional reconstruction method
CN111275751A (en) Unsupervised absolute scale calculation method and system
CN113689546B (en) Cross-modal three-dimensional reconstruction method for ultrasound or CT image of two-view twin transducer
CN114612305B (en) Event-driven video super-resolution method based on stereogram modeling
CN115294199A (en) Underwater image enhancement and depth estimation method, device and storage medium
CN113689547B (en) Ultrasonic or CT medical image three-dimensional reconstruction method of cross-view visual transducer
CN113689543B (en) Epipolar constrained sparse attention mechanism medical image three-dimensional reconstruction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant