WO2022163402A1 - 学習済みモデルの生成方法、機械学習システム、プログラムおよび医療画像処理装置 - Google Patents
学習済みモデルの生成方法、機械学習システム、プログラムおよび医療画像処理装置 Download PDFInfo
- Publication number
- WO2022163402A1 WO2022163402A1 PCT/JP2022/001351 JP2022001351W WO2022163402A1 WO 2022163402 A1 WO2022163402 A1 WO 2022163402A1 JP 2022001351 W JP2022001351 W JP 2022001351W WO 2022163402 A1 WO2022163402 A1 WO 2022163402A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- dimensional
- resolution
- generator
- learning
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 228
- 238000000034 method Methods 0.000 title claims abstract description 160
- 238000010801 machine learning Methods 0.000 title claims abstract description 86
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 36
- 238000003384 imaging method Methods 0.000 claims description 87
- 230000008569 process Effects 0.000 claims description 79
- 238000006243 chemical reaction Methods 0.000 claims description 68
- 238000003860 storage Methods 0.000 claims description 32
- 238000011176 pooling Methods 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 10
- 230000001131 transforming effect Effects 0.000 claims description 6
- 238000002059 diagnostic imaging Methods 0.000 claims description 3
- 238000011946 reduction process Methods 0.000 claims description 2
- 238000002591 computed tomography Methods 0.000 description 131
- 238000010586 diagram Methods 0.000 description 34
- 230000006870 function Effects 0.000 description 25
- 238000002595 magnetic resonance imaging Methods 0.000 description 19
- 238000004891 communication Methods 0.000 description 18
- 238000004364 calculation method Methods 0.000 description 16
- 230000009467 reduction Effects 0.000 description 16
- 238000007781 pre-processing Methods 0.000 description 14
- 210000000056 organ Anatomy 0.000 description 13
- 238000013500 data storage Methods 0.000 description 12
- 230000004048 modification Effects 0.000 description 12
- 238000012986 modification Methods 0.000 description 12
- 238000013473 artificial intelligence Methods 0.000 description 9
- 201000010099 disease Diseases 0.000 description 9
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 9
- 238000001514 detection method Methods 0.000 description 8
- 230000000052 comparative effect Effects 0.000 description 7
- 230000010365 information processing Effects 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 4
- 239000004065 semiconductor Substances 0.000 description 4
- 210000004556 brain Anatomy 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 206010056342 Pulmonary mass Diseases 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004195 computer-aided diagnosis Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000005401 electroluminescence Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 230000003902 lesion Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000002600 positron emission tomography Methods 0.000 description 2
- 230000008685 targeting Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 206010035664 Pneumonia Diseases 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 210000004204 blood vessel Anatomy 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 210000005075 mammary gland Anatomy 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/003—Reconstruction from projections, e.g. tomography
- G06T11/008—Specific post-processing after tomographic reconstruction, e.g. voxelisation, metal artifact correction
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/05—Detecting, measuring or recording for diagnosis by means of electric currents or magnetic fields; Measuring using microwaves or radio waves
- A61B5/055—Detecting, measuring or recording for diagnosis by means of electric currents or magnetic fields; Measuring using microwaves or radio waves involving electronic [EMR] or nuclear [NMR] magnetic resonance, e.g. magnetic resonance imaging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
- G06V10/7747—Organisation of the process, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B6/00—Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
- A61B6/02—Arrangements for diagnosis sequentially in different planes; Stereoscopic radiation diagnosis
- A61B6/03—Computed tomography [CT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/04—Indexing scheme for image data processing or generation, in general involving 3D image data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10081—Computed x-ray tomography [CT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10088—Magnetic resonance imaging [MRI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2211/00—Image generation
- G06T2211/40—Computed tomography
- G06T2211/441—AI-based methods, deep learning or artificial neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Definitions
- the present invention relates to a method of generating a trained model, a machine learning system, a program, and a medical image processing apparatus, and more particularly to machine learning technology and image processing technology for domain conversion of 3D images.
- AI Artificial intelligence
- Patent Document 1 describes a diagnosis support system that uses AI to extract organ regions from medical images.
- Patent Literature 2 describes an image processing method for generating high-definition three-dimensional data obtained by changing the slice thickness from three-dimensional data of a predetermined slice thickness captured by a modality such as a CT apparatus.
- Non-Patent Document 1 uses a network that combines two configurations of Generative Adversarial Networks (GAN), and converts images of two different domains to each other without using paired images as training data. technology is disclosed.
- GAN Generative Adversarial Networks
- Non-Patent Document 2 applies the technology of Non-Patent Document 1 and proposes a method for learning the task of domain transformation and organ region extraction for three-dimensional medical images.
- Medical images are generated by various modalities, and the characteristics of the images differ for each modality.
- a computer-aided diagnosis (Computer Aided Diagnosis, Computer Aided Detection: CAD) system using AI is generally constructed for each modality in which target medical images are taken. If it is possible to apply the technology constructed with a specific modality to images of other modalities, it is expected to be used in more situations.
- a high-performance image conversion process between different modalities such as processing to generate a pseudo MR image from a CT image, or conversely, processing to generate a pseudo CT image from an MR image.
- An image converter is required. Note that “image conversion” may be rephrased as “image generation”, and the converter may be rephrased as “generator”. Modality is understood as a kind of domain corresponding to image features.
- Medical images taken using a CT device or MRI device can be three-dimensional data in which two-dimensional slice images are continuous in the slice thickness direction. desired.
- Non-Patent Document 1 deals with two-dimensional images and does not describe application to three-dimensional images.
- Non-Patent Document 2 proposes a method of learning image conversion between different domains targeting three-dimensional medical images. It is necessary to train the model using high-resolution datasets for each of the three cross-sections of .
- 3D data taken under different shooting conditions such as 3D data in which only one specific cross section among the three types of cross sections has high resolution.
- 3D data in which only one specific cross section among the three types of cross sections has high resolution For example, three-dimensional data of a thick slice, which is widely used in actual clinical practice, has low resolution in the slice thickness direction, and only one specific cross section out of three types of cross sections has high resolution.
- three-dimensional data of thin slices with a slice thickness of 1 mm or less have high resolution in each of the three orthogonal axes (eg, x-axis, y-axis and z-axis) including the slice thickness direction. All of the cross-sections are high resolution. Thin-slice three-dimensional data takes longer to capture than thick-slice three-dimensional data, and the amount of data is also larger. Therefore, many medical institutions usually acquire thick-slice data. Therefore, thick slice data can be obtained relatively easily compared to thin slice data.
- the features of the generated image generated by the model depend on the data used for learning.
- the learning architecture for two-dimensional images described in Non-Patent Document 1 is applied as it is to the architecture for three-dimensional images, and easily available thick slice data is used as training data.
- the generated images are obtained under the same conditions (thick slices) as the data used for learning, so it is difficult to generate high-resolution three-dimensional images for each of the three types of cross sections.
- the present disclosure has been made in view of such circumstances. It is an object of the present invention to provide a trained model generating method, a machine learning system, a program, and a medical image processing apparatus capable of generating a high-resolution 3D generated image by transforming the domain of a 3D image that has been generated.
- a method of generating a trained model is a method of generating a trained model that transforms the domain of an input 3D image and outputs a 3D generated image of a different domain, wherein the first domain
- a first generator configured using a three-dimensional convolutional neural network that receives an input of a three-dimensional image of and outputs a three-dimensional generated image of a second domain different from the first domain, and generated by the first generator
- a two-dimensional convolutional neural network that accepts input of a two-dimensional image showing a cross-sectional image in the first slice plane direction cut out from the three-dimensional generated image of the second domain and determines the authenticity of the input two-dimensional image
- a learning model having a structure of an adversarial generation network including a configured first discriminator
- a computer uses a three-dimensional image captured under a first imaging condition and a second imaging condition different from the first imaging condition Acquiring a plurality of learning data including a three-dimensional image captured by the method, and performing a learning
- the learning data can be divided into two-dimensional images of cross-sectional images whose first slice plane direction is the slice plane direction with relatively high resolution, and can be input to the first discriminator.
- the first generator can generate the cross-sectional image in the direction of the first slice plane.
- a trained primary generator can be used as a trained model to perform the task of hetero-domain image generation that transforms the domain of a 3D image.
- the method of generating a trained model is understood as the method of producing a trained model. Also, the method of generating a trained model may be understood as a machine learning method implemented using a computer. Resolution may also be called spatial resolution.
- a computer In a method of generating a trained model according to another aspect of the present disclosure, a computer generates a two-dimensional image showing a cross-sectional image in the direction of the first slice plane from the three-dimensional generated image of the second domain generated by the first generator.
- the configuration may include performing a first clipping process of clipping, and inputting the two-dimensional image clipped by the first clipping process to the first discriminator.
- the first imaging condition includes that the device used for imaging is the first imaging device, and the second imaging condition is that the device used for imaging is It is possible to adopt a configuration including being a second imaging device of a type different from the first imaging device.
- the first imaging condition includes that the resolution condition is the first resolution condition
- the second imaging condition is that the resolution condition is the first resolution condition. It can be configured to include a second resolution condition different from .
- At least one of the first imaging condition and the second imaging condition has resolution in one of the three orthogonal axes as a resolution condition. lower than the resolution in each of the two axial directions of .
- the resolution in one of the three orthogonal axes is higher than the resolution in each of the other two axes.
- Anisotropic three-dimensional data with low resolution is used, and the first slice plane direction is a slice plane direction parallel to other two axial directions with relatively high resolution in the anisotropic three-dimensional data. be able to.
- the learning model further includes a first A second discriminator configured using a two-dimensional convolutional neural network that receives an input of a two-dimensional image representing a cross-sectional image in the two-slice plane direction and discriminates whether the input two-dimensional image is true or false; , a process of making the first generator and the second discriminator learn in an adversarial manner.
- a computer In a method for generating a trained model according to another aspect of the present disclosure, a computer generates a two-dimensional image showing a cross-sectional image in the direction of the second slice plane from the three-dimensional generated image of the second domain generated by the first generator.
- a configuration may include performing a second clipping process of clipping, and inputting the two-dimensional image clipped by the second clipping process to the second discriminator.
- the learning data is defined as follows: Anisotropic 3D data with z-axis low resolution lower than the resolution and y-axis low-resolution anisotropy 3 whose y-axis direction resolution is lower than the respective resolutions in the z-axis direction and the x-axis direction
- the first slice plane direction is the slice plane direction parallel to the x-axis direction and the y-axis direction
- the second slice plane direction is the slice plane direction parallel to the z-axis direction and the x-axis direction. It can be configured as follows.
- the learning data is defined as: y-axis low-resolution anisotropic 3D data lower than the resolution and x-axis low-resolution anisotropic 3 whose x-axis direction resolution is lower than the respective resolutions in the y-axis direction and z-axis direction
- the first slice plane direction is the slice plane direction parallel to the z-axis direction and the x-axis direction
- the second slice plane direction is the slice plane direction parallel to the y-axis direction and the z-axis direction. It can be configured as follows.
- the learning data is defined as: x-axis low-resolution anisotropic 3D data lower than resolution and z-axis low-resolution anisotropic 3 whose z-axis direction resolution is lower than the respective resolutions in the x-axis direction and y-axis direction
- the first slice plane direction is the slice plane direction parallel to the y-axis direction and the z-axis direction
- the second slice plane direction is the slice plane direction parallel to the x-axis direction and the y-axis direction. It can be configured as follows.
- a first discrimination used for authenticity discrimination of the three-dimensional generated image of the second domain according to the condition of the resolution of the input learning data It is possible to adopt a configuration in which the device or the second discriminator is selectively switched.
- the resolution in one of the three orthogonal axes is higher than the resolution in the other two directions.
- a configuration in which low anisotropic three-dimensional data is used can be employed.
- a computer transforms a three-dimensional image captured under a first imaging condition into isotropic three-dimensional data having the same resolution in each axial direction of three orthogonal axes.
- the configuration may include performing a first isotropic process for conversion and inputting the isotropic three-dimensional data after conversion by the first isotropic process to the first generator.
- the first generator receives input of isotropic three-dimensional data having the same resolution in each axial direction of three orthogonal axes, and generates a three-dimensional generated image. It can be configured to output isotropic three-dimensional data.
- the learning model further includes a three-dimensional convolutional neural network that receives an input of a three-dimensional image of the second domain and outputs a three-dimensional generated image of the first domain. and a two-dimensional image representing a cross-sectional image in a specific slice plane direction extracted from the three-dimensional generated image of the first domain generated by the second generator, and input a third discriminator constructed using a two-dimensional convolutional neural network that discriminates truth or falsehood of the generated two-dimensional image, and the learning process adversarially learns the second generator and the third discriminator It can be configured to include a process for causing
- This aspect can be an application of the so-called CycleGAN mechanism described in Non-Patent Document 1.
- a computer In a method for generating a trained model according to another aspect of the present disclosure, a computer generates a two-dimensional image showing a cross-sectional image in a specific slice plane direction from a three-dimensional generated image of a first domain generated by a second generator.
- the configuration may include performing a third clipping process of clipping, and inputting the two-dimensional image clipped by the third clipping process to the third discriminator.
- the computer inputs the 3D generated image of the second domain output from the first generator to the second generator, thereby generating Based on the output first reconstructed image, a process of calculating the first reconstruction loss of the conversion process using the first generator and the second generator in this order; A first transformation process using the second generator and the first generator in this order based on the second reconstructed generated image output from the first generator by inputting the three-dimensional generated image of the domain into the first generator. 2 calculating the reconstruction loss;
- the computer used for input to the first generator when generating the first reconstructed image for the first reconstructed image A first average pooling process is performed to convert the original learning data into three-dimensional data with the same resolution, and the three-dimensional data after conversion by the first average pooling process and the original learning data used as input to the first generator Calculating a first reconstruction loss based on .
- the computer used for input to the second generator when generating the second reconstructed image for the second reconstructed image A second average pooling process is performed to convert the original training data into three-dimensional data with the same resolution. calculating a second reconstruction loss based on .
- the learning model further includes a slice orthogonal to a specific slice plane direction extracted from the three-dimensional generated image of the first domain generated by the second generator.
- a fourth discriminator configured using a two-dimensional convolutional neural network that receives an input of a two-dimensional image representing a cross-sectional image in the plane direction and discriminates whether the input two-dimensional image is true or false; It is possible to adopt a configuration including a process of making the second generator and the fourth discriminator learn hostilely.
- a computer In a method of generating a trained model according to another aspect of the present disclosure, a computer generates a cross-sectional image in a slice plane direction orthogonal to a specific slice plane direction from a three-dimensional generated image of a first domain generated by a second generator. and inputting the two-dimensional image cut out by the fourth cut-out process to the fourth discriminator.
- the specific slice plane direction may be the first slice plane direction.
- a computer transforms a three-dimensional image captured under the second imaging condition into isotropic three-dimensional data having the same resolution in each direction of three orthogonal axes.
- a configuration may include performing a second isotropic process for conversion and inputting isotropic three-dimensional data after conversion by the second isotropic process to a second generator.
- the first imaging condition may correspond to the first domain
- the second imaging condition may correspond to the second domain
- the three-dimensional image captured under the first imaging condition is a first modality image captured using a first modality that is a medical device
- the The three-dimensional image captured under the two imaging conditions is a second modality image captured using a second modality, which is a type of medical device different from the first modality
- the learning model receives the input of the first modality image.
- learning may be performed to generate a pseudo second modality-generated image having features of an image captured using the second modality.
- the first domain may have a first resolution
- the second domain may have a second resolution higher than the first resolution
- the three-dimensional image captured under the first imaging condition is such that the resolution in the first axial direction among the three orthogonal axes is the resolution in each of the other two axial directions.
- the 3D image captured under the second imaging condition has a resolution in the second axis direction different from the first axis direction out of the three orthogonal axes, and the resolution in the second axis direction is the other two.
- the learning model receives at least one input of the first-axis low-resolution three-dimensional data and the second-axis low-resolution three-dimensional data; , learning is performed so as to generate isotropic three-dimensional data having a resolution higher than that of the input three-dimensional data.
- a computer performs resolution reduction processing for reducing the resolution of a three-dimensional generated image in a first domain generated by a first generator, Calculating the reconstruction loss of the image conversion by the super-resolution processing and the low-resolution processing by the first generator based on the reconstructed generated image obtained by the low-resolution processing. can be done.
- a machine learning system is a machine learning system that trains a learning model that transforms the domain of an input 3D image to generate a 3D generated image of a different domain, comprising at least one a first processor and at least one first storage device storing a program executed by the at least one first processor, the learning model receiving an input of a three-dimensional image of a first domain; a first generator configured using a three-dimensional convolutional neural network that outputs a three-dimensional generated image of a second domain different from the domain; A first discriminator configured using a two-dimensional convolutional neural network that accepts input of a two-dimensional image showing a cross-sectional image in the first slice plane direction and discriminates the authenticity of the input two-dimensional image a three-dimensional image captured under a first imaging condition and a three-dimensional image captured under a second imaging condition different from the first imaging condition by executing the instructions of the program; A plurality of learning data including the generated three-dimensional image is acquired, and a learning process is performed to make the
- a program causes a computer to execute a process of training a learning model that converts the domain of an input 3D image to generate a 3D generated image of a different domain
- the learning model includes a first generator configured using a three-dimensional convolutional neural network that receives an input of a three-dimensional image of a first domain and outputs a three-dimensional generated image of a second domain different from the first domain; Receiving an input of a two-dimensional image representing a cross-sectional image in a first slice plane direction extracted from a three-dimensional generated image of the second domain generated by the first generator, and determining whether the input two-dimensional image is true or false (2) a first discriminator constructed using a dimensional convolutional neural network, and a computer having a three-dimensional image captured under the first imaging condition and a different image from the first imaging condition; Acquire a plurality of learning data including a three-dimensional image captured under a second imaging condition, and execute a learning process for hostile learning of the first generator and the first discrimin
- a medical image processing apparatus stores a first trained model, which is a first trained generator trained by performing a trained model generation method according to the present disclosure.
- a storage device and a second processor that performs image processing using the first trained model, the first trained model receives an input of the first medical image, and generates a domain different from that of the first medical image.
- FIG. 1 shows examples of three cross-sectional images cut out from a three-dimensional morphological image of the brain taken by an MRI apparatus.
- FIG. 2 is an image example showing the expression difference between a thin slice and a thick slice in a CT image.
- FIG. 3 is an example of a thick-slice MR image captured using an MRI apparatus.
- FIG. 4 is an example of a thin-slice CT image captured using a CT apparatus.
- FIG. 5 is a conceptual diagram showing an overview of processing in the machine learning system according to the first embodiment.
- FIG. 6 is a functional block diagram showing a configuration example of the machine learning system according to the first embodiment.
- FIG. 7 is a functional block diagram illustrating a configuration example of a learning data generation unit;
- FIG. 1 shows examples of three cross-sectional images cut out from a three-dimensional morphological image of the brain taken by an MRI apparatus.
- FIG. 2 is an image example showing the expression difference between a thin slice and a thick slice in a
- FIG. 8 is a conceptual diagram of a learning data set used in the first embodiment.
- FIG. 9 is a conceptual diagram showing Modification 1 of the first embodiment.
- FIG. 10 is a conceptual diagram showing an outline of processing in the machine learning system 100 that learns the MR ⁇ CT domain conversion task.
- FIG. 11 is a conceptual diagram of a learning data set used in the second embodiment.
- FIG. 12 is a functional block diagram showing a configuration example of a machine learning system according to the second embodiment.
- FIG. 13 is a schematic diagram showing the processing flow at the time of CT input in the machine learning system according to the second embodiment.
- FIG. 14 is a schematic diagram showing the processing flow at the time of MR input in the machine learning system according to the second embodiment.
- FIG. 15 is a schematic diagram showing a processing flow when a thick-slice MR image having a high-resolution axial section is input.
- FIG. 16 is a schematic diagram showing a processing flow when a thick-slice MR image with a high-resolution coronal section is input.
- FIG. 17 is an example image showing the performance of CT ⁇ MR conversion by a trained generator obtained by performing learning using the machine learning system according to the second embodiment.
- FIG. 18 is an image example showing the performance of MR ⁇ CT conversion by a trained generator obtained by performing learning using the machine learning system according to the second embodiment.
- FIG. 19 is a configuration example of a learning model applied to a machine learning system according to a comparative example.
- FIG. 20 is an example of a pseudo MR image generated by a generator that has learned a CT ⁇ MR conversion task using a machine learning system according to a comparative example.
- FIG. 21 is a block diagram showing a configuration example of an information processing device applied to the machine learning system.
- FIG. 22 is a block diagram showing a configuration example of a medical image processing apparatus using a learned generator generated by performing learning using a machine learning system.
- FIG. 23 is a conceptual diagram showing an outline of processing of the machine learning system according to the third embodiment.
- FIG. 24 is a conceptual diagram showing an overview of processing in the machine learning system according to the fourth embodiment.
- FIG. 25 is a schematic diagram showing a processing flow when a three-dimensional image with a high-resolution axial section is input in the machine learning system according to the fourth embodiment.
- FIG. 26 is a schematic diagram showing a processing flow when a high-resolution three-dimensional image of a coronal cross section is input in the machine learning system according to the fourth embodiment.
- FIG. 27 is a block diagram showing an example of the hardware configuration of a computer;
- Modalities such as CT devices and MRI devices are typical examples of devices that take medical images.
- the basic idea is to obtain three-dimensional data representing the three-dimensional form of an object by continuously capturing two-dimensional slice images.
- three-dimensional data includes the concept of a set of two-dimensional slice images taken continuously, and is synonymous with three-dimensional images.
- image includes the meaning of image data.
- a collection of consecutive two-dimensional slice images is sometimes called a "two-dimensional image sequence" or a "two-dimensional image series.”
- the term "two-dimensional image” includes the concept of a two-dimensional slice image derived from three-dimensional data.
- cross-sections Three types of cross-sections (two-dimensional slice cross-sections) obtained by reconstructing data from imaging equipment such as CT devices or MRI devices are conceivable: axial cross-sections, sagittal cross-sections, and coronal cross-sections.
- Fig. 1 is an example of images of three types of cross sections extracted from a three-dimensional morphological image of the brain taken by an MRI device.
- a sagittal cross-sectional image is shown from the left
- an axial cross-sectional image is shown in the center
- a coronal cross-sectional image is shown on the right.
- an orthogonal coordinate system is introduced in which the body axis direction is the z-axis direction
- the horizontal direction (horizontal direction) of the human body in a standing posture is the x-axis direction
- the depth direction front-rear direction
- the axial section is a section (xy plane) perpendicular to the z-axis, ie, a plane parallel to the x-axis direction and the y-axis direction.
- a sagittal section is a section (yz plane) perpendicular to the x-axis.
- a coronal cross section is a cross section (zx plane) orthogonal to the y-axis.
- FIG. 2 is an image example showing the expression difference between a thin slice and a thick slice in a CT image.
- the upper part of FIG. 2 shows an image example of each of three types of cross sections when a thin slice with a slice thickness of 1 mm is reconstructed in the axial cross section.
- the lower part of FIG. 2 shows an example of images of three types of cross sections when a thick slice with a slice thickness of 8 mm is reconstructed in the axial cross section.
- the left is an axial section
- the center is a sagittal section
- the right is a coronal section.
- FIG. 3 shows an example of a MR thick slice as modality A
- FIG. 4 shows an example of a CT thin slice as modality B. As shown in FIG. 3
- the three images shown on the left side of FIG. 3 are examples of MR images reconstructed from thick slices with a high-resolution coronal section, and the three images shown on the right side of FIG.
- This is an example of an MR image in which thick slices that are resolved are reconstructed.
- MR images taken by an MRI apparatus only the coronal section has high resolution, and both the axial section and the sagittal section are three-dimensional data with low resolution, while only the axial section has high resolution. There may be cases where the resolution of the coronal and sagittal sections is low-resolution three-dimensional data.
- Three-dimensional data in which only the coronal cross section has high resolution is data with high resolution in the x-axis and z-axis directions and low resolution in the y-axis direction.
- three-dimensional data in which only the axial section has high resolution is data with high resolution in the x-axis and y-axis directions and low resolution in the z-axis direction.
- MR images actual MR images
- MR images can include various types of images such as T1-weighted images, T2-weighted images, HeavyT2-weighted images, and diffusion-weighted images.
- the thin slice three-dimensional data obtained by imaging using a CT apparatus has high resolution for all three types of cross sections (all three axial directions), as shown in FIG. can be data.
- the difference in resolution in each axial direction in the three-dimensional data as shown in FIGS. 3 and 4 depends on the shooting conditions when acquiring the three-dimensional data. Note that when a thick slice is also imaged by a CT apparatus, as shown in FIG. 3, only a cross section in a specific direction can be three-dimensional data with high resolution.
- FIG. 5 is a conceptual diagram showing an outline of processing in the machine learning system 10 according to the first embodiment.
- the source domain is CT
- the target domain is MR
- a method of learning an image transformation task of generating a pseudo MR image from a CT image based on the GAN architecture will be described.
- the machine learning system 10 includes a generator 20G configured using a three-dimensional CNN (Convolutional Neural Network), and at least two discriminators 24D and 26D each configured using a two-dimensional CNN including.
- the generator 20G is a three-dimensional generation network (3D generator) that receives input of three-dimensional data having CT domain features and outputs three-dimensional data having MR domain features.
- a V-net type architecture which is a three-dimensional extension of the U-net, for example, is applied to the generator 20G.
- U-net is a neural network widely used for segmentation of medical images.
- An example of a document describing U-net is “Olaf Ronneberger, et al. "U-Net: Convolutional Networks for Biomedical Image Segmentation", MICCAI, 2015.”
- V-net for example, there is “Fausto Milletari, et.al. "V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation”.
- the plurality of discriminators 24D and 26D are two-dimensional discriminator networks (2D discriminators) that discriminate the authenticity of images corresponding to the input of two-dimensional images in different cross-sectional directions.
- Each of the discriminators 24D and 26D employs, for example, a two-dimensional discriminator architecture used in a technique called Pix2Pix.
- An example of a document describing Pix2Pix is "Phillip Isola, et.al. "Image-to-Image Translation with Conditional Adversarial Nets".
- the inputs to the discriminators 24D and 26D are treated as two-dimensional images divided in a specific slice thickness direction. Then, the average values of the outputs of the truth discrimination results obtained for each of these divided slice images (two-dimensional images) are used as the final outputs of the discriminators 24D and 26D.
- three-dimensional CT data actually obtained by imaging using a CT apparatus
- MR data actually obtained by imaging using an MRI apparatus.
- 3D data real MR images
- the heterogeneous modality images given as inputs have the same imaging target region.
- the output (generated image) after conversion by the generator 20G is assumed to be an image of the same part as the input image.
- the paired CT three-dimensional data and the MR three-dimensional data used for training are regarded as the same imaging range or substantially the same imaging range for the same patient. It is assumed that approximately the same photographing range is photographed within a permissible range.
- the three-dimensional data of MR used for learning are thick slice data with high resolution of the axial section (low resolution of the sagittal section and coronal section) and high resolution of the coronal section (axial section). and the resolution of the sagittal section is low).
- one discriminator 24D is a 2D axial discriminator that discriminates true/false with respect to the input of the two-dimensional image of the axial section
- the other discriminator 26D is the input of the two-dimensional image of the coronal section. It is a 2D coronal discriminator that discriminates authenticity.
- the three-dimensional data of CT used for learning in the first embodiment may be thin-slice data (see FIG. 4) with high resolution of each of three types of cross sections, or three types of data similar to MR. Thick slice data having a low resolution of any one of the cross sections may be used.
- the generator 20G is configured to receive an isotropic resolution 3D CT image and output an isotropic resolution 3D MR generated image.
- the machine learning system 10 includes an isotropic processing unit 12 that performs isotropic processing of three-dimensional data in the preceding stage of the generator 20G.
- the isotropic process is a process of converting the size of pixels in each of the x-axis direction, y-axis direction, and z-axis direction to equal intervals. This is the process of converting the unit length of the direction into a physical size of equal intervals. That is, the isotropic process corresponds to a process of transforming voxels in three-dimensional data into cubes of a predetermined size.
- Isotropic resolution means that the shape of the voxel is cubic, that is, that the resolution in the x-axis, y-axis, and z-axis directions of the three-dimensional image is the same.
- the isotropic processing unit 12 uses, for example, nearest neighbor interpolation, linear interpolation, spline interpolation, or the like to interpolate data with a physical size of 1 mm 3 in regular grid units in a three-dimensional space.
- the physical size of the regular lattice unit is not limited to 1 mm 3 , and may be any size that provides sufficient resolution required for interpretation.
- the three-dimensional data of CT may be subjected to isotropic processing using a known technique before being input to the generator 20G, or pseudo data output from the generator 20G without isotropic processing.
- the 3D CNN of generator 20G may be designed such that the MR images are isotropic.
- a three-dimensional CT image with an anisotropic unit lattice of (x, y, z) is input, and the same A generator may be designed to output an anisotropic 3D pseudo-MR image of grid size (x, y, z respective pixel spacing preserved).
- the machine learning system 10 further includes a first cutout processing unit 14 and a second cutout processing unit 16 that cut out two-dimensional images in at least two slice plane (cross-section) directions from the three-dimensional data generated by the generator 20G.
- the clipping processing performed by the first clipping processing unit 14 and the second clipping processing unit 16 is processing for extracting a slice (two-dimensional image) in a specific direction from three-dimensional data.
- the specific direction in which the clipping process is performed corresponds to the direction of the cross section of the two-dimensional image representing the cross section image input to each of the classifiers 24D and 26D.
- a first slice of the axial section is cut out corresponding to each of the discriminator 24D that receives the input of the two-dimensional image of the axial section and the discriminator 26D that receives the input of the two-dimensional image of the coronal section. It has a cutout processing unit 14 and a second cutout processing unit 16 for cutting out a slice of the coronal cross section.
- Each of the first cutout processing unit 14 and the second cutout processing unit 16 may perform processing for extracting all slices in a specific direction from the three-dimensional pseudo MR image output from the generator 20G.
- the first cutout processing unit 14 extracts 64 two-dimensional images whose image size in the xy plane is 64 ⁇ 64.
- the clipping processing unit 16 may be configured to extract 64 two-dimensional images having an image size of 64 ⁇ 64 on the zx plane.
- the discriminator 24D receives a two-dimensional image cut out by the first cut-out processing unit 14, or a two-dimensional image extracted from three-dimensional data with a high-resolution axial section from the actual MR images included in the learning data.
- a discriminator 24D discriminates whether the image is a real image or a fake image generated by the generator 20G.
- the discriminator 26D receives the two-dimensional image extracted by the second clipping processing unit 16 or the two-dimensional image extracted from the three-dimensional data of MR with high resolution of the coronal cross section among the learning data, The discriminator 26D discriminates whether the image is a real image or a fake image.
- Real image means an actual image obtained by actually taking a picture using a shooting device.
- a “fake image” means a generated image (pseudo image) artificially generated by image conversion processing without photographing.
- data used as learning data to be input to the learning model 44 is a "real image”
- the generated image generated by the generator 20G is a "fake image”.
- the real MR images prepared as learning data have high resolution only in one of the three cross-sectional directions, authenticity determination is performed using the high-resolution cross-sectional two-dimensional image.
- the two-dimensional discriminators 24D and 26D corresponding to the high-resolution slice plane direction are selectively switched and used according to the input data.
- the discriminators 24D and 26D used for image evaluation for discriminating authenticity are selectively used according to the resolution condition of the input image, and only two-dimensional cross-sectional images with high resolution are used to obtain real images. An image is evaluated as to whether it is an image or a fake image.
- the generator 20G is an example of the "first generator” in the present disclosure.
- the discriminator 24D is an example of the "first discriminator” in the present disclosure, and the discriminator 26D is an example of the "second discriminator” in the present disclosure.
- the CT domain is an example of the "first domain” in the present disclosure, and the MR domain is an example of the "second domain”.
- An example of the “second imaging condition” in the present disclosure is that the imaging equipment used for imaging is an MRI apparatus.
- a CT apparatus is an example of a "first imaging device” and a "first modality” in the present disclosure
- a CT image is an example of a "first modality image” in the present disclosure.
- An MRI apparatus is an example of a "second imaging device” and a "second modality” in the present disclosure
- an MR image is an example of a "second modality image” in the present disclosure.
- a thin slice is an example of the "first resolution condition” in the present disclosure.
- a thick slice is an example of a "second resolution condition” in the present disclosure.
- a slice plane direction in which an axial section is obtained is an example of a "first slice plane direction” in the present disclosure
- a slice plane direction in which a coronal section is obtained is an example of a "second slice plane direction” in the present disclosure
- the clipping process performed by the first clipping processor 14 is an example of the "first clipping process” in the present disclosure
- the clipping process performed by the second clipping processor 16 is an example of the "second clipping process” in the present disclosure
- the isotropic processing performed by the isotropic processing unit 12 is an example of the “first isotropic processing” in the present disclosure.
- FIG. 6 is a functional block diagram showing a configuration example of the machine learning system 10 according to the first embodiment.
- the machine learning system 10 includes a learning data generator 30 and a learning processor 40 .
- Machine learning system 10 may further include image storage 50 and learning data storage 54 .
- the machine learning system 10 can be implemented by a computer system including one or more computers.
- Each function of the learning data generation unit 30, the learning processing unit 40, the image storage unit 50, and the learning data storage unit 54 can be realized by a combination of computer hardware and software.
- the function of each of these units may be realized by one computer, or may be realized by dividing the function of processing by two or more computers.
- the learning data generation unit 30, the learning processing unit 40, the image storage unit 50, and the learning data storage unit 54 may be connected to each other via an electric communication line.
- connection is not limited to wired connections, but also includes the concept of wireless connections.
- a telecommunications line may be a local area network or a wide area network.
- the image storage unit 50 includes a large-capacity storage device that stores CT reconstructed images (CT images) captured by a medical X-ray CT apparatus and MR reconstructed images (MR images) captured by an MRI apparatus.
- CT images CT reconstructed images
- MR images MR reconstructed images
- the image storage unit 50 may be, for example, a DICOM server that stores medical images according to the DICOM (Digital Imaging and Communications in Medicine) standard.
- the medical image stored in the image storage unit 50 may be an image of each part of the human body, or may be an image of the whole body.
- the learning data generation unit 30 generates training data (learning data) used for machine learning.
- Learning data is synonymous with "training data.”
- three-dimensional data which are real CT images actually taken using a CT apparatus
- three-dimensional data which are real MR images actually taken using an MRI apparatus.
- Such learning data can be generated from data stored in the image storage unit 50 .
- the learning data generation unit 30 acquires the original three-dimensional data from the image storage unit 50, performs preprocessing such as isotropicization, posture transformation, and extraction of a fixed size area, and converts the data into data suitable for input to the learning processing unit 40. Three-dimensional data with a desired number of pixels (voxels) and image size is generated. In order to efficiently implement the learning process by the learning processing unit 40, a plurality of pieces of learning data may be generated in advance using the learning data generation unit 30 and stored in storage as a learning data set. .
- the learning data storage unit 54 includes a storage for storing preprocessed learning data generated by the learning data generation unit 30.
- the learning data generated by the learning data generation unit 30 is read from the learning data storage unit 54 and input to the learning processing unit 40 .
- the learning data storage unit 54 may be included in the learning data generation unit 30, or part of the storage area of the image storage unit 50 may be used as the learning data storage unit 54. Also, part or all of the processing functions of the learning data generation unit 30 may be included in the learning processing unit 40 .
- the learning processing unit 40 includes an image acquisition unit 42 and a learning model 44 having a GAN structure.
- the image acquisition unit 42 acquires learning data to be input to the learning model 44 from the learning data storage unit 54 .
- Learning data acquired via the image acquisition unit 42 is input to the learning model 44 .
- the learning model 44 includes a generator 20G, a first clipping processor 14, a second clipping processor 16, and discriminators 24D and 26D.
- the learning processing unit 40 further includes an error computing unit 46 and an optimizer 48 .
- the error calculator 46 uses a loss function to evaluate the error between the output from the classifiers 24D and 26D and the correct answer. Furthermore, the error calculation unit 46 evaluates the error between the pseudo MR (fake MR) two-dimensional image extracted by the first clipping processing unit 14 and the corresponding correct (real MR) two-dimensional image. .
- An error can be rephrased as a loss.
- the optimizer 48 performs processing for updating network parameters in the learning model 44 based on the calculation result of the error calculation unit 46 .
- the network parameters include filter coefficients (weights of connections between nodes) and node biases used for processing each layer of the CNN.
- the optimizer 48 performs a parameter calculation process for calculating the update amounts of the parameters of the networks of the generator 20G and the discriminators 24D and 26D from the calculation results of the error calculation unit 46, and the generator 20G and discriminators 24D and 26D.
- the optimizer 48 updates parameters based on algorithms such as the gradient descent method.
- the learning processing unit 40 repeats adversarial learning using the generator 20G and the discriminators 24D and 26D based on the input learning data, thereby improving the performance of each model while improving the learning model 44. let them learn
- FIG. 7 is a functional block diagram showing a configuration example of the learning data generator 30.
- the learning data generation unit 30 includes an isotropic processing unit 12 , a posture conversion unit 32 , and a fixed-size area cutout processing unit 34 .
- the learning data generating unit 30 converts the three-dimensional data isotropic by the isotropic processing unit 12 to 1 mm in pixel unit size in each of the x-axis, y-axis, and z-axis directions with the orientation transforming unit 32 .
- the fixed-size area clipping processing unit 34 randomly clips a fixed-size area.
- the fixed-size area may be a three-dimensional area having a cubic shape, for example, "160 ⁇ 160 ⁇ 160" in the number of pixels in the x-axis direction ⁇ y-axis direction ⁇ z-axis direction.
- the original three-dimensional data input to the learning data generation unit 30 may be CT images or MR images.
- the fixed-size three-dimensional data extracted into the fixed-size area by the fixed-size area extraction processing unit 34 is stored in the learning data storage unit 54 . Note that fixed-size three-dimensional data cut out into fixed-size regions may be understood as learning data, and original three-dimensional data before being cut out into fixed-size regions may be understood as learning data.
- FIG. 8 is a conceptual diagram of the learning data set used in the first embodiment.
- a plurality of pairs of CT thin-slice three-dimensional data and corresponding MR thick-slice three-dimensional data are prepared, and these image pairs are used as learning data.
- the discriminators 24D and 26D used for authenticity discrimination are switched according to the input three-dimensional data. That is, when an image pair of an MR image having a high-resolution axial section and a corresponding CT image is input, the determination of the generated image after conversion by the generator 20G evaluates the two-dimensional image of the axial section. This is performed by the discriminator 24D.
- the generated image after conversion by the generator 20G is determined by evaluating a two-dimensional image of the coronal section. This is performed by the discriminator 26D.
- the generator 20G can achieve high resolution in each of the x-axis, y-axis, and z-axis directions. Acquire the ability to generate a three-dimensional image of the image.
- a three-dimensional generator 20G capable of obtaining high-resolution images of each of the axial, coronal, and sagittal sections. can.
- a method of generating a trained generator 20G by learning processing using the machine learning system 10 is an example of a "trained model generation method" in the present disclosure.
- the thin-slice CT image used for learning is an example of the “three-dimensional image captured under the first imaging condition” in the present disclosure
- the thick-slice MR image is the “three-dimensional image captured under the second imaging condition” in the present disclosure. It is an example of "dimensional image”.
- Thick-slice three-dimensional data is an example of “isotropic three-dimensional data” in the present disclosure
- thick-slice three-dimensional data is an example of “anisotropic three-dimensional data” in the present disclosure.
- Thick slice 3D data with a high-resolution axial section is anisotropic 3D data in which the resolution in the z-axis direction is lower than the resolution in each of the other two axial directions (x-axis direction and y-axis direction).
- the direction of the axial section in the axial high-resolution three-dimensional data is the slice plane direction parallel to the x-axis direction and the y-axis direction with relatively high resolution.
- Three-dimensional data of a thick slice with a high-resolution coronal cross section is an example of “y-axis low-resolution anisotropic three-dimensional data” in the present disclosure.
- a pseudo MR image output from the generator 20G is an example of a “second modality generated image” in the present disclosure.
- FIG. 9 is a conceptual diagram showing Modification 1 of the first embodiment.
- elements common to those in FIG. 5 are denoted by the same reference numerals, and overlapping descriptions are omitted.
- FIG. 9 points different from FIG. 5 will be described.
- the machine learning system 11 includes, in addition to the configuration of FIG. 18 and a discriminator 28D for evaluating the two-dimensional image of the sagittal section.
- the discriminator 28D is configured using a two-dimensional CNN, like the other discriminators 24D and 26D.
- a pair of a thick-slice MR image with a high sagittal section resolution and a corresponding CT image can be used as learning data.
- the machine learning system 11 when a pair of images of an MR image with a high-resolution sagittal section and a corresponding CT image is input, the generated image after conversion by the generator 20G is discriminated. is performed by a discriminator 28D that evaluates the two-dimensional image of the sagittal section.
- ⁇ Modification 2>> In the first embodiment, an example of using low-resolution anisotropic three-dimensional data on the z-axis and low-resolution anisotropic three-dimensional data on the y-axis as learning data has been described. Combinations of low-resolution data are not limited to this example.
- the discriminators are a 2D coronal discriminator that accepts input of a two-dimensional image of a coronal cross section and discriminates true/false, and a two-dimensional image of a sagittal cross section that receives input of a true/false discriminator.
- a 2D sagittal discriminator is used to discriminate , and when inputting to each discriminator, a cross-sectional image in the slice plane direction corresponding to each discriminator is cut out from the three-dimensional generated image.
- Three-dimensional data of a thick slice with a high-resolution sagittal section is an example of three-dimensional data with low resolution in the x-axis direction, and an example of "anisotropic three-dimensional data with low resolution on the x-axis" in the present disclosure. is.
- a 2D sagittal classifier and a 2D axial classifier are used as classifiers, and when inputting to each classifier, a three-dimensional generated image corresponds to each classifier.
- the cutout processing of the corresponding cross-sectional image in the direction of the sliced plane is performed.
- the same architecture as that of the learning model according to the first embodiment may be applied.
- 3D discriminator three-dimensional discriminator
- FIG. 10 is a conceptual diagram showing an outline of processing in the machine learning system 100 that learns the MR ⁇ CT domain conversion task.
- the machine learning system 100 includes a generator 120F configured using a 3D CNN and a discriminator 124D configured using a 3D CNN.
- the machine learning system 100 may include an isotropic processing unit 112 that performs isotropic processing on the three-dimensional data before input to the generator 120F, and an attitude transforming unit and a fixed-size region clipping processing unit (not shown).
- the discriminator 124D receives the input of the three-dimensional data generated by the generator 120F or the three-dimensional data that is a real CT image included in the learning data, and determines whether the input three-dimensional data is a real image, Determine whether the image is a fake image.
- FIG. 11 is a conceptual diagram of a learning data set used in the second embodiment.
- an image group of real CT images and an image group of real MR images need only exist.
- three-dimensional data of a plurality of real images are used in the respective domains of CT and MRI as data used for learning.
- the three-dimensional data of each domain may be the same as those described with reference to FIGS. 3 and 4.
- FIG. 11 is a conceptual diagram of a learning data set used in the second embodiment.
- an image group of real CT images and an image group of real MR images need only exist.
- three-dimensional data of a plurality of real images are used in the respective domains of CT and MRI as data used for learning.
- the three-dimensional data of each domain may be the same as those described with reference to FIGS. 3 and 4.
- the learning data set used in the second embodiment includes a plurality of thin-slice three-dimensional data and a plurality of thick-slice three-dimensional data captured using an MRI apparatus. Note that the learning data set may include three-dimensional data of thick slices captured using a CT apparatus.
- FIG. 12 is a functional block diagram showing a configuration example of the machine learning system 210 according to the second embodiment. 12, elements identical or similar to those in the configuration shown in FIG.
- the learning data storage unit 54 shown in FIG. 12 stores a learning data set in which thin-slice three-dimensional data and thick-slice three-dimensional data as described in FIG. 11 are mixed.
- the machine learning system 210 includes a learning processing unit 240 instead of the learning processing unit 40 in FIG.
- the learning processing unit 240 includes an image acquisition unit 42 , a preprocessing unit 230 , a learning model 244 , an error calculation unit 246 and an optimizer 248 .
- the preprocessing unit 230 performs processing similar to that of the learning data generation unit 30 described with reference to FIG.
- the preprocessing unit 230 performs preprocessing for input to the learning model 244 on the three-dimensional data acquired via the image acquisition unit 42 .
- preprocessing isotropic processing, posture conversion, and extraction processing of a fixed-size region are exemplified, but these processing may be performed as necessary, and some or all of the processing in preprocessing section 230 may be performed. can be omitted.
- the preprocessing unit 230 may be configured separately with a preprocessing unit for CT that preprocesses CT images and a preprocessing unit for MR that preprocesses MR images.
- the learning model 244 includes a first generator 220G, a first clipping processor 14, a second clipping processor 16, a first discriminator 224D, a second discriminator 226D, a second generator 250F, a second It includes a 3-cutout processing unit 254, a fourth cut-out processing unit 256, a third discriminator 264D, and a fourth discriminator 266D.
- the first generator 220G and the second generator 250F are each configured using a three-dimensional CNN. Each network structure of the first generator 220G and the second generator 250F may be similar to the generator 20G described in the first embodiment.
- Each of the first discriminator 224D, the second discriminator 226D, the third discriminator 264D, and the fourth discriminator 266D is configured using a two-dimensional CNN.
- the network structure of these classifiers may be the same as the classifiers 24D and 26D described in the first embodiment.
- the first generator 220G is a 3D generator that performs domain conversion from CT to MRI, receives input of three-dimensional data having CT domain features, generates and outputs three-dimensional data having MR domain features. do.
- the description “3D_CT” input to the first generator 220G in FIG. 12 represents three-dimensional data of an isotropic real CT image.
- the second generator 250F is a 3D generator that performs domain conversion from MRI to CT, receives input of 3D data having MR domain features, and generates and outputs 3D data having CT domain features. do.
- the description “3D_MR” input to the second generator 250F in FIG. 12 represents three-dimensional data of an isotropic real MR image.
- the output of the first generator 220G is connected to the input of the second generator 250F, and the pseudo MR image generated by the first generator 220G can be input to the second generator 250F.
- the output of the second generator 250F is connected to the input of the first generator 220G, and the pseudo CT image generated by the first generator 220G can be input to the second generator 250F.
- the third clipping processing unit 254 performs clipping processing for extracting a slice of an axial section from the three-dimensional data of the pseudo CT image output from the second generator 250F.
- the two-dimensional image extracted by the third clipping processor 254 is input to the third discriminator 264D.
- the third classifier 264D the two-dimensional image extracted by the third extraction processing unit 254 or the two-dimensional image of the axial section extracted from the actual three-dimensional CT data (real CT image) included in the learning data. is input, and the third discriminator 264D discriminates whether the image is a real image or a fake image generated by the second generator 250F.
- the fourth clipping processing unit 256 performs clipping processing for extracting coronal cross-section slices from the three-dimensional pseudo CT image output from the second generator 250F.
- the two-dimensional image extracted by the fourth clipping processor 254 is input to the fourth discriminator 266D.
- the fourth discriminator 266D receives the two-dimensional image extracted by the fourth extraction processing unit 256, or the two-dimensional image of the coronal cross section extracted from the actual three-dimensional CT data (real CT image) included in the learning data. is input, and the fourth discriminator 266D discriminates whether the image is a real image or a fake image.
- the error calculator 46 uses a loss function to evaluate the error (Adversarial Loss) between the output from each discriminator (224D, 226D, 264D, 266D) and the correct answer. Further, the error calculator 46 evaluates the reconstruction loss (Cycle Consistency Loss) due to the image conversion connecting the first generator 220G and the second generator 250F.
- the reconstruction loss includes a reconstructed generated image output from the second generator 250F by inputting the output of CT ⁇ MR conversion by the first generator 220G to the second generator 250F, and By inputting the error from the input original input image (reconstruction loss due to CT->MR->CT conversion) and the output of the MR->CT conversion by the second generator 250F to the second generator 250F, the first generator 220G and the original input image input to the second generator 250F (reconstruction loss due to MR ⁇ CT ⁇ MR conversion).
- the optimizer 248 performs processing for updating network parameters in the learning model 244 based on the calculation result of the error calculation unit 246 .
- the optimizer 248 calculates each of the first generator 220G, the first discriminator 224D, the second discriminator 226D, the second generator 250F, the third discriminator 264D, and the fourth discriminator 266D from the calculation result of the error calculation unit 46.
- a parameter calculation process for calculating an update amount of a network parameter and a parameter update process for updating each network parameter according to the calculation result of the parameter calculation process are performed.
- FIG. 13 is a conceptual diagram showing the flow of processing during CT input in the machine learning system 210 according to the second embodiment.
- each of the first generator 220G and the second generator 250F receives input of a three-dimensional image of isotropic resolution and outputs a three-dimensional generated image of isotropic resolution will be explained.
- it may be a generator that accepts input of a three-dimensional image with an anisotropic resolution.
- the three-dimensional data of CT is input to the first generator 220G as a three-dimensional CT image CTr with isotropic resolution through isotropic processing, etc. by the isotropic processing unit 12 .
- the first generator 220G receives the CT image CTr, performs CT ⁇ MR conversion, and outputs a pseudo MR image MRsyn.
- This pseudo MR image MRsyn is divided into slices (two-dimensional images) in specific cross-sectional directions by the first cut-out processing unit 14 and the second cut-out processing unit 16, respectively. It is input to the second discriminator 226D, and authenticity discrimination is performed by each of the first discriminator 224D and the second discriminator 226D.
- the pseudo MR image MRsyn is further input to the second generator 250F, MR ⁇ CT conversion is performed by the second generator 250F, and the reconstructed CT image CTrec is output from the second generator 250F.
- the machine learning system 210 evaluates the reconstruction loss that indicates the difference between the reconstructed CT image CTrec output from the second generator 250F and the original CT image CTr.
- This reconstruction loss is an example of the "first reconstruction loss" in the present disclosure.
- a reconstructed CT image CTrec generated by transform processing using the first generator 220G and the second generator 250F in this order is an example of a "first reconstructed generated image" in the present disclosure.
- FIG. 14 is a conceptual diagram showing the flow of processing during MR input in the machine learning system 210 according to the second embodiment.
- the three-dimensional data of MR is subjected to isotropic processing by the isotropic processing unit 12, etc., and is input to the second generator 250F as a three-dimensional MR image MRr of isotropic resolution.
- the second generator 250F receives the input of the MR image MRr, performs MR ⁇ CT conversion, and outputs a pseudo CT image CTsyn.
- the isotropic processing performed on the MR three-dimensional data is an example of the “second isotropic processing” in the present disclosure.
- This pseudo CT image CTsyn is divided into slices (two-dimensional images) in specific cross-sectional directions by the third cut-out processing unit 254 and the fourth cut-out processing unit 256, respectively. It is input to the 4 discriminator 266D, and authenticity discrimination is performed by each of the third discriminator 264D and the fourth discriminator 266D.
- the pseudo CT image CTsyn is further input to the first generator 220G, CT ⁇ MR conversion is performed by the first generator 220G, and the reconstructed MR image MRrec is output from the first generator 220G.
- the difference between the reconstructed MR image MRrec and the original MR image MRr may be calculated in the same manner as in FIG. , the reconstructed MR image MRrec is subjected to average pooling processing, converted to the same size as the MR image (before isotropic) of the thick slice used for input, and then the original (isotroized) It is preferable to calculate the error (reconstruction loss) with the previous) MR image.
- This reconstruction loss is an example of the "second reconstruction loss" in the present disclosure.
- the reconstructed MR image MRrec generated by conversion processing using the second generator 250F and the first generator 220G in this order is an example of the "second reconstructed generated image" in the present disclosure.
- FIG. 15 shows an example in which a thick-slice MR image MRax whose axial section has a high resolution is input.
- the machine learning system 210 has an average pooling processing unit 270 .
- the average pooling processing unit 270 performs average pooling processing in the z-axis direction on the isotropic resolution reconstructed MR image MRrec output from the first generator 220G, and the original MR image MRax used for input. 3D data with the same slice interval is restored.
- a reconstruction loss is calculated by comparing the reconstructed MR image MRaxrec output from the average pooling processor 270 and the original MR image MRax.
- FIG. 16 shows an example in which a thick-slice MR image MRco with a high-resolution coronal cross section is input.
- Machine learning system 210 further comprises an average pooling processor 272 .
- the average pooling processing unit 272 performs average pooling processing in the y-axis direction on the isotropic resolution reconstructed MR image MRrec output from the first generator 220G, and the original MR image MRco used for input. 3D data with the same slice interval is restored.
- a reconstruction loss is calculated by comparing the reconstructed MR image MRcorec output from the average pooling processor 270 and the original MR image MRco.
- the average pooling processors 270 and 272 may be provided between the second generator 250F and the error calculator 246 in FIG. 12, or may be incorporated in the error calculator 246.
- the reconstructed CT image CTrec is subjected to An average pooling process, and the reconstruction loss may be calculated based on the three-dimensional data after conversion by the average pooling process and the three-dimensional data that is the original input image.
- the average pooling process performed on the reconstructed CT image CTrec is an example of the "first average pooling process" in the present disclosure.
- the average pooling process performed on the reconstructed MR image MRrec is an example of the "second average pooling process" in the present disclosure.
- the first generator 220G acquires the image generation capability of CT ⁇ MR conversion, and generates a high-resolution pseudo MR image. It can be a converter.
- the second generator 250F can be a three-dimensional image converter that acquires MR ⁇ CT conversion image generation capability and generates high-resolution pseudo-CT images.
- FIG. 17 is an example image showing the performance of CT ⁇ MR conversion by the trained first generator 220G obtained by performing learning using the machine learning system 210 according to the second embodiment.
- FIG. 17 shows the results of learning using a thin-slice CT data set and a thick-slice MR data set.
- the MR data set used for learning includes only two types of thick slices with high-resolution axial sections and thick slices with high-resolution coronal sections.
- FIG. 17 shows an example of a pseudo MR image generated when a CT thin slice image is input.
- the pseudo MR images generated by the trained first generator 220G are the axial, coronal, and sagittal slices in spite of the fact that no MR thick slice has been learned. A high-definition image with high resolution is obtained in each cross-section.
- FIG. 18 is an image example showing the performance of MR ⁇ CT conversion by the trained second generator 250F obtained by performing learning using the machine learning system 210 according to the second embodiment.
- FIG. 18 shows an example of a pseudo CT image obtained when a thick-slice MR image is input.
- the pseudo CT images generated by the trained second generator 250F are high-definition images with high resolution in each of the axial, coronal, and sagittal sections.
- FIG. 19 is a configuration example of a learning model 344 applied to a machine learning system according to a comparative example.
- the learning model 344 is a 3D-CycleGAN extended to three-dimensional input and output based on the CycleGAN architecture. and discriminators 324D and 364D configured.
- the generator 320G is an image generation network that performs CT ⁇ MR conversion, receives input of CT three-dimensional data, and outputs MR three-dimensional data.
- the generator 350F is an image generation network that performs MR ⁇ CT conversion, receives input of MR three-dimensional data, and outputs CT three-dimensional data.
- the discriminator 324D is a three-dimensional discriminator that receives input of three-dimensional data of the pseudo MR image generated by the generator 320G or the real MR image included in the learning data and discriminates whether the image is true or false. . Similarly, the discriminator 364D receives the input of the pseudo CT image generated by the generator 350F or the three-dimensional data of the real CT image included in the learning data, and determines the authenticity of the image by a three-dimensional discriminator. Noeta.
- the machine learning system includes, in addition to the learning model 344, an error calculator and an optimizer (not shown).
- the pseudo MR image generated by the generator 320G in response to the input of the actual CT image is input to the generator 350F, and the MR ⁇ CT conversion is performed by the generator 350F.
- An image is output. Based on this reconstructed CT image and the original real CT image, the reconstruction loss due to CT ⁇ MR ⁇ CT conversion is evaluated.
- the pseudo CT image generated by the generator 350F in response to the input of the real MR image is input to the generator 320G, the CT ⁇ MR conversion is performed by the generator 320G, and the reconstructed MR image is generated from the generator 320G. output. Based on this reconstructed MR image and the original real MR image, reconstruction loss due to MR ⁇ CT ⁇ MR conversion is evaluated.
- FIG. 20 shows an example of a generated image obtained when learning is performed using a machine learning system according to a comparative example and using a learning data set similar to that of the second embodiment as learning data.
- the figure is an example of a pseudo-MR image generated by a generator trained on the task of CT to MR conversion.
- the slice thickness (Thickness) of the learning data domain is also learned at the same time. Therefore, if the MR images used for learning are thick-slice three-dimensional data, the generated images will reproduce the image expression of the thick slices, and the image quality of each cross section will be low, making it difficult to generate high-definition images.
- FIG. 21 is a block diagram showing a configuration example of an information processing device 400 applied to the machine learning systems 10 and 210.
- the information processing device 400 includes a processor 402 , a tangible, non-transitory computer-readable medium 404 , a communication interface 406 , an input/output interface 408 , a bus 410 , an input device 414 , and a display device 416 .
- Processor 402 is an example of the "first processor" in this disclosure.
- Computer-readable medium 404 is an example of a "first storage device" in this disclosure.
- the processor 402 includes a CPU (Central Processing Unit). Processor 402 may include a GPU (Graphics Processing Unit). Processor 402 is coupled to computer readable media 404 , communication interface 406 and input/output interface 408 via bus 410 . Input device 414 and display device 416 are connected to bus 410 via input/output interface 408 .
- CPU Central Processing Unit
- GPU Graphics Processing Unit
- the computer-readable medium 404 includes memory, which is a main memory device, and storage, which is an auxiliary memory device.
- Computer readable medium 404 may be, for example, a semiconductor memory, a hard disk drive (HDD) device, or a solid state drive (SSD) device, or a combination of some of these.
- the information processing device 400 is connected to an electric communication line (not shown) via a communication interface 406 .
- the telecommunication line may be a wide area communication line, a local communication line, or a combination thereof.
- the computer-readable medium 404 stores multiple programs and data for performing various types of processing.
- the computer-readable medium 404 stores, for example, an isotropic processing program 420, an attitude conversion program 422, a fixed-size region clipping processing program 424, a learning processing program 430, and the like.
- Learning processing program 430 includes learning model 244 , error calculation program 436 , and parameter update program 438 .
- the information processing apparatus 400 including the processor 402 functions as a processing unit corresponding to the program.
- the processor 402 functions as the isotropic processing unit 12 that performs the isotropic processing by executing the instructions of the isotropic processing program 420 .
- the processor 402 functions as the learning processing units 40 and 240 that perform learning processing by executing the instructions of the learning processing program 430 by the processor 402 . The same is true for other programs.
- the computer-readable medium 404 also stores a display control program (not shown).
- the display control program generates display signals necessary for display output to the display device 416 and controls the display of the display device 416 .
- the display device 416 is configured by, for example, a liquid crystal display, an organic electro-luminescence (OEL) display, a projector, or an appropriate combination thereof.
- the input device 414 is configured by, for example, a keyboard, mouse, multi-touch panel, or other pointing device, voice input device, or an appropriate combination thereof. The input device 414 accepts various inputs by the operator.
- FIG. 22 is a block diagram showing an example hardware configuration of a medical image processing apparatus 500 to which a trained model generated by performing learning processing using the machine learning systems 10 and 210 is applied.
- the medical imaging device 500 comprises a processor 502, a tangible non-transitory computer readable medium 504, a communication interface 506, an input/output interface 508, a bus 510, an input device 514, and a display device 516. .
- the hardware configuration of the processor 502, the computer-readable medium 504, the communication interface 506, the input/output interface 508, the bus 510, the input device 514, the display device 516, etc. is the same as the processor 402, the computer-readable medium, and the like in the information processing apparatus 400 described with reference to FIG. 404 , communication interface 406 , input/output interface 408 , bus 410 , input device 414 and display device 416 .
- Processor 502 is an example of a "second processor” in this disclosure.
- “Computer-readable medium 504" is an example of a "second storage device” in the present disclosure.
- At least one of a CT-MR conversion program 520 and an MR-CT conversion program 530 is stored in the computer-readable medium 504 of the medical image processing apparatus 500 .
- the CT-MR conversion program 520 includes a trained generator 522 that has learned the CT ⁇ MR domain conversion.
- Trained generator 522 is a trained model corresponding to generator 20G in FIG. 5 or first generator 220G in FIG. Trained generator 522 is an example of a “first trained model” in this disclosure.
- a CT image input to the first generator 220G is an example of a "first medical image” in the present disclosure.
- a pseudo MR image output from the first generator 220G is an example of a "second medical image” in the present disclosure.
- a pseudo MR image output from the trained generator 522 is an example of a “second medical image” in the present disclosure.
- the MR-CT conversion program 530 includes a trained generator 532 that has learned the MR ⁇ CT domain conversion.
- the trained generator 532 is a trained model corresponding to the second generator 250F in FIG.
- the computer-readable medium 504 may further include at least one of an isotropic processing program 420 , an organ recognition AI program 540 , a disease detection AI program 542 and a report generation support program 544 .
- the isotropic processing program 420 may be included in each of the CT-MR conversion program 520 and the MR-CT conversion program 530 .
- the organ recognition AI program 540 includes a processing module that performs organ segmentation.
- the organ recognition AI program 540 may include a lung segment labeling program, a blood vessel region extraction program, a bone labeling program, and the like.
- the disease detection AI program 542 includes detection processing modules corresponding to specific diseases.
- the disease detection AI program 542 may include, for example, at least one of a lung nodule detection program, a lung nodule characterization program, a pneumonia CAD program, a mammary gland CAD program, a liver CAD program, a brain CAD program, and a colon CAD program. .
- the report creation support program 544 includes a trained document generation model that generates finding sentence candidates corresponding to target medical images.
- Various processing programs such as the organ recognition AI program 540, the disease detection AI program 542, and the report creation support program 544 apply machine learning such as deep learning to obtain the desired task output. It may be an AI processing module that contains a model.
- AI models for CAD can be constructed using, for example, various CNNs with convolutional layers.
- Input data for the AI model includes, for example, medical images such as two-dimensional images, three-dimensional images, or moving images, and output from the AI model is, for example, information indicating the position of a diseased area (lesion site) in the image, or It may be information indicating class classification such as a disease name, or a combination thereof.
- AI models that handle time-series data, document data, etc. can be constructed using, for example, various recurrent neural networks (RNN).
- the time-series data includes, for example, electrocardiogram waveform data.
- the document data includes, for example, observation statements created by doctors.
- the generated images produced by CT-MR conversion program 520 or MR-CT conversion program 530 can be input into at least one of organ recognition AI program 540, disease detection AI program 542 and report generation support program 544. . This makes it possible to apply an AI processing module built for a specific modality to images of other modalities, expanding the scope of application.
- FIG. 23 is a conceptual diagram showing an outline of processing of the machine learning system 600 according to the third embodiment.
- a method of learning a super-resolution image generation task (super-resolution task) for generating a high-resolution 3-dimensional image from a low-resolution 3-dimensional image, targeting MR images will be described.
- the low-resolution 3D MR images used as input consist of a series of axial images with high resolution only for the axial section (low resolution for the other sections), and a series of high resolution only for the coronal section ( is a low resolution) coronal image series.
- An axial image series is three-dimensional data with a lower resolution in the z-axis direction than in the other two axial directions, and is understood as a "z-axis direction low-resolution image”.
- a coronal image series is three-dimensional data with a lower resolution in the y-direction than in the other two directions, and is understood as a "low-resolution y-direction image”.
- the axial image series will be referred to as "axial three-dimensional images”
- the coronal image series will be referred to as "coronal three-dimensional images”.
- Super-resolution in the third embodiment means slice interpolation for interpolating data in the slice thickness direction (axial direction) with low resolution.
- an image pair of an axial three-dimensional image and a coronal three-dimensional image obtained by photographing the same part of the same patient and performing three-dimensional alignment is used as data for learning.
- An image group including a plurality of image pairs in which an axial three-dimensional image and a coronal three-dimensional image are associated is used as a training data set.
- the machine learning system 600 includes a generator 610 that performs first super-resolution processing, a generator 612 that performs second super-resolution processing, an axial image clipping processing unit 620, a coronal image clipping processing unit 622, an axial image and a discriminator 632 for discriminating the truth of the coronal image.
- Each of generators 610 and 612 is a generation network constructed using a 3D CNN.
- the network structure of each of generators 610 and 612 may be similar to generator 20 in the first embodiment.
- Each of the discriminators 630 and 632 is a discrimination network constructed using a two-dimensional CNN.
- the network structure of each of the classifiers 630, 632 may be the same as the classifiers 24D, 26D in the first embodiment.
- the first super-resolution processing includes super-resolution processing in the z-axis direction.
- the generator 610 accepts an input of an axial 3D image and outputs an isotropic resolution 3D generated image.
- the second super-resolution processing includes super-resolution processing in the y-axis direction.
- the generator 612 accepts input of a coronal 3D image and outputs an isotropic resolution 3D generated image. Note that the notation "SR" in the figure represents processing for super resolution.
- the axial image clipping processing unit 620 performs clipping processing for extracting a two-dimensional image of an axial section from the three-dimensional generated image SRsyn generated by the generator 610 or generator 612 .
- the coronal image clipping processing unit 622 performs clipping processing for extracting a two-dimensional image of a coronal cross section from the three-dimensional generated image SRsyn generated by the generator 610 or generator 612 .
- the discriminator 630 receives an input of a two-dimensional image extracted from the three-dimensional generated image SRsyn by the axial image clipping processing unit 620 or a slice image of an axial three-dimensional image included in the learning data set. , to determine whether the image is a real image or a fake image.
- the discriminator 632 receives an input of a two-dimensional image extracted from the three-dimensional generated image SRsyn by the coronal image clipping processing unit 622 or a two-dimensional image that is a slice image of the coronal three-dimensional image included in the learning data set. , to determine whether the image is a real image or a fake image.
- the generator 610 When an axial high-resolution image is input to the generator 610, the three-dimensional generated image generated by the first super-resolution processing by the generator 610 is cut out in the coronal cross-sectional direction, and is combined with the correct coronal image. An error (absolute error) is calculated.
- the machine learning system 600 repeats adversarial learning for the generators 610, 612 and the discriminators 630, 632 to improve the performance of both.
- a trained generator 610 that generates a high-definition isotropic three-dimensional image from a low-resolution axial three-dimensional image and a low-resolution
- a trained generator 612 can be obtained that generates an isotropic resolution high definition 3D image from a coronal 3D image of the image.
- the axial three-dimensional image used for learning in the third embodiment is an example of the "three-dimensional image captured under the first imaging condition" in the present disclosure
- the coronal three-dimensional image is the "three-dimensional image captured under the second imaging condition” in the present disclosure.
- This is an example of a "three-dimensional image”.
- the z-axis direction in the axial three-dimensional image is an example of the "first axis direction” in the present disclosure
- the axial three-dimensional image is an example of the "first axis low-resolution three-dimensional data" in the present disclosure.
- the y-axis direction in the sagittal three-dimensional image is an example of the "second axis direction" in the present disclosure
- the sagittal three-dimensional image is an example of the "second axis low-resolution three-dimensional data" in the present disclosure.
- FIG. 24 is a conceptual diagram showing an outline of processing in the machine learning system 602 according to the fourth embodiment. 24, the same or similar elements as those in the configuration shown in FIG. 23 are denoted by the same reference numerals, and overlapping descriptions are omitted. Regarding the configuration shown in FIG. 24, points different from FIG. 23 will be described.
- the third embodiment an example was described in which two discriminators 630 and 632 are used to discriminate authenticity of the three-dimensional generated image SRsyn generated by the generator 610 or the generator 612.
- a process for reducing the resolution of a 3D generated image is added to the architecture of the third embodiment, and a conversion process is performed in which a super-resolution process and a low-resolution process corresponding to its inverse transform are performed in this order.
- a mechanism for evaluating the reconstruction loss by is incorporated, and one of the discriminators 630 and 632 is used for the three-dimensional generated image.
- the machine learning system 602 shown in FIG. 24 includes a resolution reduction processing unit 614 that performs a z-axis direction resolution reduction process on the three-dimensional generated image SRsyn, and a y-axis direction processing on the three-dimensional generated image SRsyn. and a resolution reduction processing unit 616 that performs resolution reduction processing.
- the resolution reduction by the resolution reduction processing unit 614 corresponds to inverse conversion processing for the first super-resolution processing of the generator 610 .
- the machine learning system 602 evaluates the reconstruction loss based on the original input axial 3D image and the axial 3D reconstructed generated image to update the parameters of the generator 610 . Note that the notation "LR" in the figure represents processing for low resolution.
- the resolution reduction by the resolution reduction processing unit 616 corresponds to inverse conversion processing for the second super-resolution processing of the generator 612.
- a reconstructed image (coronal three-dimensional reconstructed image) corresponding to the coronal three-dimensional image used for input is obtained.
- the machine learning system 602 evaluates the reconstruction loss based on the original input image, the coronal 3D image, and the coronal 3D reconstructed generated image, and updates the parameters of the generator 612 .
- the structure is similar to that of CycleGAN, and there is no need for a pair relationship between the axial 3D image and the coronal 3D image used for learning. It is sufficient if there is a learning data group for each of the three-dimensional image groups.
- a configuration is adopted in which randomly given axial three-dimensional images and coronal three-dimensional images are repeatedly learned.
- FIG. 25 is a processing flow when an axial three-dimensional image is input in the machine learning system 602.
- the axial three-dimensional image is input to the generator 610, and the generator 610 outputs the three-dimensional generated image SRsyn1.
- This three-dimensional generated image SRsyn1 is subjected to resolution reduction by a resolution reduction processing unit 614 to generate an axial three-dimensional reconstruction generated image, and a reconstruction loss is calculated.
- a two-dimensional image of a coronal cross section is cut out from the three-dimensional generated image SRsyn1 output from the generator 610, and a discriminator 632 is used to determine whether the coronal image is true or false. make a judgment.
- Machine learning system 602 repeats adversarial learning for generator 612 and classifier 632 to improve their performance.
- FIG. 26 is a processing flow when a coronal three-dimensional image is input in the machine learning system 602.
- the coronal 3D image is input to the generator 612 and the 3D generated image SRsyn is output from the generator 612 .
- This three-dimensional generated image SRsyn2 is subjected to resolution reduction by a resolution reduction processing unit 616 to generate a coronal three-dimensional reconstructed generated image, and a reconstruction loss is calculated.
- a coronal three-dimensional image is input to the generator 612
- a two-dimensional image of an axial section is cut out from the three-dimensional generated image SRsyn2 output from the generator 612, and a discriminator 630 is used to determine the authenticity of the axial image. make a judgment.
- Machine learning system 602 repeats adversarial learning for generator 612 and classifier 630 to improve their performance.
- a learned generator 610 that generates a high-definition isotropic three-dimensional image from a low-resolution axial three-dimensional image and a low-resolution
- a trained generator 612 can be obtained that generates an isotropic resolution high definition 3D image from a coronal 3D image of the image.
- ⁇ Modification 4>> Other examples of domain transforms include transforming between different image types such as T1-weighted, T2-weighted, fat-suppressed, contrast-enhanced and non-contrast-enhanced images in MR, or between contrast-enhanced and non-contrast-enhanced images in CT.
- the technology of the present disclosure can also be applied to conversion between images.
- the technology of the present disclosure is applicable not only to CT images and MR images, but also to various medical images such as ultrasound images that project human body information and PET images captured using a Positron Emission Tomography (PET) apparatus. is included in the scope of application.
- PET Positron Emission Tomography
- the technology of the present disclosure can be applied not only to medical images captured by medical equipment, but also to three-dimensional images for various purposes captured by various imaging devices.
- FIG. 27 is a block diagram showing an example of the hardware configuration of a computer;
- Computer 800 may be a personal computer, a workstation, or a server computer.
- the computer 800 can be used as a part or all of any of the machine learning systems 10, 11, 210, 600, 602 and the medical image processing apparatus 500 already described, or as a device having a plurality of these functions.
- the computer 800 includes a CPU 802 , a RAM (Random Access Memory) 804 , a ROM (Read Only Memory) 806 , a GPU 808 , a storage 810 , a communication section 812 , an input device 814 , a display device 816 and a bus 818 .
- the GPU 808 may be provided as needed.
- the CPU 802 reads various programs stored in the ROM 806, storage 810, etc., and executes various processes.
- a RAM 804 is used as a work area for the CPU 802 . Also, the RAM 804 is used as a storage unit that temporarily stores read programs and various data.
- the storage 810 includes, for example, a hard disk device, an optical disk, a magneto-optical disk, or a semiconductor memory, or a storage device configured using an appropriate combination thereof.
- Various programs, data, and the like are stored in the storage 810 .
- a program stored in the storage 810 is loaded into the RAM 804 and executed by the CPU 802, whereby the computer 800 functions as means for performing various processes defined by the program.
- the communication unit 812 is an interface that performs wired or wireless communication processing with an external device and exchanges information with the external device.
- the communication unit 812 can serve as an information acquisition unit that receives input such as an image.
- the input device 814 is an input interface that receives various operational inputs to the computer 800 .
- Input device 814 may be, for example, a keyboard, mouse, multi-touch panel, or other pointing device, or voice input device, or any suitable combination thereof.
- the display device 816 is an output interface that displays various information.
- the display device 816 may be, for example, a liquid crystal display, an organic electro-luminescence (OEL) display, a projector, or an appropriate combination thereof.
- OEL organic electro-luminescence
- a program that causes a computer to implement part or all of at least one processing function is recorded on a computer-readable medium that is a tangible non-temporary information storage medium such as an optical disk, a magnetic disk, or a semiconductor memory, and this information storage It is possible to provide the program through a medium.
- At least one of various processing functions such as an image acquisition function, a preprocessing function and a learning processing function in the machine learning systems 10, 11, 210, 600, 602, and an image processing function in the medical image processing apparatus 500 may be realized by cloud computing, and can also be provided as a Sass (Software as a Service) service.
- Sass Software as a Service
- processors include CPUs, which are general-purpose processors that run programs and function as various processing units, GPUs, which are processors specialized for image processing, and FPGAs (Field Programmable Gate Arrays).
- PLD Programmable Logic Device
- ASIC Application Specific Integrated Circuit
- a single processing unit may be composed of one of these various processors, or may be composed of two or more processors of the same type or different types.
- one processing unit may be configured by a plurality of FPGAs, a combination of CPU and FPGA, or a combination of CPU and GPU.
- a plurality of processing units may be configured by one processor.
- a single processor is configured by combining one or more CPUs and software. There is a form in which a processor functions as multiple processing units.
- SoC System On Chip
- the various processing units are configured using one or more of the above various processors as a hardware structure.
- the hardware structure of these various processors is, more specifically, an electrical circuit that combines circuit elements such as semiconductor elements.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Veterinary Medicine (AREA)
- Public Health (AREA)
- Animal Behavior & Ethology (AREA)
- Surgery (AREA)
- Heart & Thoracic Surgery (AREA)
- Probability & Statistics with Applications (AREA)
- Pathology (AREA)
- High Energy & Nuclear Physics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
Description
医療画像を撮影する装置の代表的な例としてCT装置あるいはMRI装置といったモダリティが挙げられる。これらのモダリティにおいては、基本的な考え方として、2次元スライス画像を連続的に撮影することによって対象物の3次元形態を示す3次元データが得られる。本明細書において「3次元データ」という用語は、連続的に撮影された2次元スライス画像の集合体の概念を含み、3次元画像と同義である。なお、「画像」という用語は、画像データの意味を含む。連続する2次元スライス画像の集合体は「2次元画像シーケンス」あるいは「2次元画像シリーズ」と呼ばれる場合がある。「2次元画像」という用語は、3次元データから取り出される2次元スライス画像の概念を含む。
医療画像を対象とするドメイン変換(ここでは、異種モダリティ画像生成)のタスクを深層学習ベースのアルゴリズムを用いて学習させる場合、既述のとおり、学習に用いるデータの収集が課題の1つである。異種のモダリティにおいて、同じ撮影範囲を、同じ解像度の条件にて撮影したデータを十分に揃えることは困難である。多くの場合、モダリティごとに撮影時の解像度の条件が異なる。
第1実施形態では、図3に例示するような一部の軸方向の解像度が低い3次元データが混在する学習データセットを用いる場合であっても、3軸すべての方向(すなわち、3種類の各断面)について高解像の生成画像が得られる異種ドメイン画像生成(画像変換)を実現する機械学習システムの例を説明する。
図6は、第1実施形態に係る機械学習システム10の構成例を示す機能ブロック図である。機械学習システム10は、学習データ生成部30と、学習処理部40とを含む。機械学習システム10は、さらに、画像保存部50と学習データ保存部54とを含んでいてもよい。
図7は、学習データ生成部30の構成例を示す機能ブロック図である。学習データ生成部30は、等方化処理部12と、姿勢変換部32と、固定サイズ領域切り出し処理部34とを含む。学習データ生成部30は、例えば、等方化処理部12によってx軸、y軸およびz軸の各方向の画素単位のサイズを1mmに等方化した3次元データに対して、姿勢変換部32にて姿勢変換を行い、その後、固定サイズ領域切り出し処理部34により、無作為に固定サイズ領域を切り出す処理を行う。固定サイズ領域は、x軸方向×y軸方向×z軸方向の画素数が、例えば「160×160×160」などの立方体形状の3次元領域であってよい。
第1実施形態における機械学習システム10では、入力される3次元データに応じて、真偽判別に用いる判別器24D,26Dの切り替えが行われる。すなわち、アキシャル断面が高解像のMR画像とこれに対応するCT画像との画像ペアが入力された場合、生成器20Gによる変換後の生成画像の判別は、アキシャル断面の2次元画像を評価する判別器24Dによって行われる。
図9は、第1実施形態の変形例1を示す概念図である。図9において、図5と共通する要素には同一の符号を付し、重複する説明は省略する。図9に示す構成について、図5と異なる点を説明する。
第1実施形態では、学習データとして、z軸低解像の非等方性3次元データと、y軸低解像の非等方性3次元データとを用いる例を説明したが、2種類の低解像度データの組み合わせについては、この例に限らない。
[組み合わせ1]z軸方向が低解像度の3次元データとy軸方向が低解像度の3次元データとの組み合わせ
[組み合わせ2]y軸方向が低解像度の3次元データとx軸方向が低解像度の3次元データとの組み合わせ
[組み合わせ3]x軸方向が低解像度の3次元データとz軸方向が低解像度の3次元データとの組み合わせ
第1実施形態では実CT画像から疑似MR画像を生成するCT→MR変換の例を説明したが、第1実施形態における学習に使用した学習データと同様のデータセット(Thickスライスのデータを含むデータセット)を用いて、MR画像から疑似CT画像を生成するMRI→CT変換を行う生成器を学習させることも可能である。
医療画像の場合、異種モダリティ間で対応するペア画像を用意することが困難な場合も多い。第2実施形態では、非特許文献2に記載されているCycleGANの仕組みをベースにしたアーキテクチャを採用し、対応関係の無い(ペアでない)それぞれのドメインの画像群を学習データとして用いて、ドメイン変換のタスクを学習する例を説明する。
図13は、第2実施形態に係る機械学習システム210におけるCT入力時の処理の流れを示す概念図である。以下の説明では、第1生成器220Gおよび第2生成器250Fのそれぞれが等方解像度の3次元画像の入力を受けて、等方解像度の3次元生成画像を出力する構成である場合を説明するが、既述のとおり、非等方解像度の3次元画像の入力を受け付ける生成器であってもよい。
図14は、第2実施形態に係る機械学習システム210におけるMR入力時の処理の流れを示す概念図である。MRの3次元データは、等方化処理部12による等方化処理等を経て等方解像度の3次元のMR画像MRrとして第2生成器250Fに入力される。第2生成器250Fは、MR画像MRrの入力を受けて、MR→CT変換を行い、疑似CT画像CTsynを出力する。MRの3次元データに対して行われる等方化処理は本開示における「第2等方化処理」の一例である。
第2実施形態に係る機械学習システム210を用いて学習を行うことにより、第1生成器220Gは、CT→MR変換の画像生成能力を獲得し、高解像度の疑似MR画像を生成する3次元画像変換器となり得る。第2生成器250Fは、MR→CT変換の画像生成能力を獲得し、高解像度の疑似CT画像を生成する3次元画像変換器となり得る。
図19は、比較例に係る機械学習システムに適用される学習モデル344の構成例である。学習モデル344は、CycleGANのアーキテクチャをベースにして、3次元の入力および出力に拡張した3D-CycleGANであり、3次元CNNを用いて構成される生成器320G,350Fと、3次元CNNを用いて構成される判別器324D,364Dとを含む。
図21は、機械学習システム10,210に適用される情報処理装置400の構成例を示すブロック図である。情報処理装置400は、プロセッサ402と、有体物である非一時的なコンピュータ可読媒体404と、通信インターフェース406と、入出力インターフェース408と、バス410と、入力装置414おと、表示装置416とを備える。プロセッサ402は本開示における「第1プロセッサ」の一例である。コンピュータ可読媒体404は本開示における「第1記憶装置」の一例である。
図22は、機械学習システム10,210を用いた学習処理を実施して生成された学習済みモデルが適用される医療画像処理装置500のハードウェア構成の例を示すブロック図である。
これまでドメイン変換の例として異種モダリティ間の画像生成タスクの例を説明したが、第3実施形態では、ソースドメインをThickスライス(すなわち、低解像度)、ターゲットドメインをThinスライス(すなわち、高解像度)とする超解像タスクの例を示す。
図24は、第4実施形態に係る機械学習システム602における処理の概要を示す概念図である。図24について、図23に示す構成と同一または類似の要素には同一の符号を付し、重複する説明は省略する。図24に示す構成について、図23と異なる点を説明する。
図25に示すアキシャル3次元画像を超解像化する処理フローと、図26に示すコロナル3次元画像を超解像化する処理フローとは、必ずしも両方を実施する必要はない。例えば、アキシャル3次元画像を入力とした超解像タスクのみを実現する場合には、図25の処理フローだけで学習が可能である。
ドメイン変換の他の例として、MRの中でのT1強調画像、T2強調画像、脂肪抑制画像、造影画像および非造影画像などの異なる画像種間での変換、あるいはCTの中でも造影画像と非造影画像との間の変換などについても本開示の技術を適用できる。
本開示の技術は、CT画像およびMR画像に限らず、人体情報を投影する超音波画像および陽電子放射断層撮影(Positron Emission Tomography:PET)装置を用いて撮影されるPET画像など、各種の医療画像が応用範囲に含まれる。また、本開示の技術は、医療機器によって撮影される医療画像に限らず、様々な撮影装置によって撮影される各種用途の3次元画像に適用できる。
図27は、コンピュータのハードウェア構成の例を示すブロック図である。コンピュータ800は、パーソナルコンピュータであってもよいし、ワークステーションであってもよく、また、サーバコンピュータであってもよい。コンピュータ800は、既に説明した機械学習システム10,11,210,600,602および医療画像処理装置500のいずれかの一部または全部、あるいはこれらの複数の機能を備えた装置として用いることができる。
上記の実施形態で説明した機械学習システム10,11,210,600,602における画像取得機能、前処理機能および学習処理機能、ならびに医療画像処理装置500における画像処理機能などの各種の処理機能のうち少なくとも1つの処理機能の一部または全部をコンピュータに実現させるプログラムを、光ディスク、磁気ディスク、もしくは、半導体メモリその他の有体物たる非一時的な情報記憶媒体であるコンピュータ可読媒体に記録し、この情報記憶媒体を通じてプログラムを提供することが可能である。
等方化処理部12、生成器20G、第1切り出し処理部14、第2切り出し処理部16、切り出し処理部18、判別器24D,26D,28D、学習データ生成部30、姿勢変換部、固定サイズ領域切り出し処理部34、学習処理部40、画像取得部42、誤差演算部46,246、オプティマイザ48,248、前処理部230、第1生成器220G、第2生成器250F、第3切り出し処理部254、第4切り出し処理部256、第1判別器224D、第2判別器226D、第3判別器264D、第4判別器266D、アベレージプーリング処理部270,272、生成器610,612、判別器630,632、アキシャル画像切り出し処理部620、コロナル画像切り出し処理部622および低解像化処理部614,616などの各種の処理を実行する処理部(processing unit)のハードウェア的な構造は、例えば、次に示すような各種のプロセッサ(processor)である。
以上説明した本発明の実施形態は、本発明の趣旨を逸脱しない範囲で、適宜構成を変更、追加、または削除することが可能である。本発明は以上説明した実施形態に限定されず、本発明の技術的思想内で当該分野の通常の知識を有する者により、多くの変形が可能である。
12 等方化処理部
14 第1切り出し処理部
16 第2切り出し処理部
18 切り出し処理部
20G 生成器
24D 判別器
26D、28D 判別器
30 学習データ生成部
32 姿勢変換部
34 固定サイズ領域切り出し処理部
40 学習処理部
42 画像取得部
44 学習モデル
46 誤差演算部
48 オプティマイザ
50 画像保存部
54 学習データ保存部
100 機械学習システム
112 等方化処理部
120F 生成器
124D 判別器
210 機械学習システム
220G 第1生成器
224D 第1判別器
226D 第2判別器
230 前処理部
240 学習処理部
244 学習モデル
246 誤差演算部
248 オプティマイザ
250F 第2生成器
254 第3切り出し処理部
256 第4切り出し処理部
264D 第3判別器
266D 第4判別器
270、272 アベレージプーリング処理部
320G 生成器
324D 判別器
344 学習モデル
350F 生成器
364D 判別器
400 情報処理装置
402 プロセッサ
404 コンピュータ可読媒体
406 通信インターフェース
408 入出力インターフェース
410 バス
414 入力装置
416 表示装置
420 等方化処理プログラム
422 姿勢変換プログラム
424 固定サイズ領域切り出し処理プログラム
430 学習処理プログラム
436 誤差演算ブログラム
438 パラメータ更新プログラム
500 医療画像処理装置
502 プロセッサ
504 コンピュータ可読媒体
506 通信インターフェース
508 入出力インターフェース
510 バス
514 入力装置
516 表示装置
520 CT-MR変換プログラム
522 学習済み生成器
530 MR-CT変換プログラム
532 学習済み生成器
540 臓器認識AIプログラム
542 疾患検出AIプログラム
544 レポート作成支援プログラム
600、602 機械学習システム
610、612 生成器
614、616 低解像化処理部
620 アキシャル画像切り出し処理部
622 コロナル画像切り出し処理部
630、632 判別器
800 コンピュータ
802 CPU
804 RAM
806 ROM
808 GPU
810 ストレージ
812 通信部
814 入力装置
816 表示装置
818 バス
CTr CT画像
CTrec 再構成CT画像
CTsyn 疑似CT画像
MRax MR画像
MRaxrec 再構成MR画像
MRco MR画像
MRcorec 再構成MR画像
MRr MR画像
MRrec 再構成MR画像
MRsyn 疑似MR画像
SRsyn、SRsyn1、SRsyn2 3次元生成画像
Claims (33)
- 入力された3次元画像のドメインを変換して異なるドメインの3次元生成画像を出力する学習済みモデルの生成方法であって、
第1ドメインの3次元画像の入力を受け付け、前記第1ドメインとは異なる第2ドメインの3次元生成画像を出力する3次元畳み込みニューラルネットワークを用いて構成される第1生成器と、
前記第1生成器によって生成された前記第2ドメインの3次元生成画像から切り出される第1スライス面方向の断面画像を示す2次元画像の入力を受け付け、入力された2次元画像の真偽を判別する2次元畳み込みニューラルネットワークを用いて構成される第1判別器とを含む敵対的生成ネットワークの構造を有する学習モデルを用い、
コンピュータが、
第1撮影条件により撮影された3次元画像と、前記第1撮影条件とは異なる第2撮影条件により撮影された3次元画像とを含む複数の学習データを取得し、
前記複数の学習データに基づき、前記第1生成器と前記第1判別器とを敵対的に学習させる学習処理を行うことを含む、
学習済みモデルの生成方法。 - 前記コンピュータが、
前記第1生成器によって生成された前記第2ドメインの3次元生成画像から前記第1スライス面方向の断面画像を示す2次元画像を切り出す第1切り出し処理を行い、
前記第1切り出し処理により切り出された2次元画像を前記第1判別器に入力することを含む、
請求項1に記載の学習済みモデルの生成方法。 - 前記第1撮影条件は、撮影に使用した機器が第1撮影機器であることを含み、
前記第2撮影条件は、撮影に使用した機器が前記第1撮影機器とは異なる種類の第2撮影機器であることを含む、
請求項1または2に記載の学習済みモデルの生成方法。 - 前記第1撮影条件は、解像度の条件が第1解像度条件であることを含み、
前記第2撮影条件は、解像度の条件が前記第1解像度条件とは異なる第2解像度条件であることを含む、
請求項1から3のいずれか一項に記載の学習済みモデルの生成方法。 - 前記第1撮影条件および前記第2撮影条件のうち少なくとも1つは、解像度の条件として、直交3軸のうち1つの軸方向の解像度が他の2つの軸方向のそれぞれの解像度よりも低いことを含む、
請求項1から4のいずれか一項に記載の学習済みモデルの生成方法。 - 前記第2撮影条件により撮影された3次元画像として、直交3軸のうち1軸方向の解像度が他の2軸方向のそれぞれの解像度よりも低い非等方性3次元データが用いられ、
前記第1スライス面方向は、前記非等方性3次元データにおいて相対的に解像度が高い前記他の2軸方向に平行なスライス面方向である、
請求項1から4のいずれか一項に記載の学習済みモデルの生成方法。 - 前記学習モデルは、さらに、
前記第1生成器によって生成された前記第2ドメインの3次元生成画像から切り出される前記第1スライス面方向と直交する第2スライス面方向の断面画像を示す2次元画像の入力を受け付け、入力された2次元画像の真偽を判別する2次元畳み込みニューラルネットワークを用いて構成される第2判別器を含み、
前記学習処理は、前記第1生成器と前記第2判別器とを敵対的に学習させる処理を含む、
請求項1から6のいずれか一項に記載の学習済みモデルの生成方法。 - 前記コンピュータが、
前記第1生成器によって生成された前記第2ドメインの3次元生成画像から前記第2スライス面方向の断面画像を示す2次元画像を切り出す第2切り出し処理を行い、
前記第2切り出し処理により切り出された2次元画像を前記第2判別器に入力することを含む、
請求項7に記載の学習済みモデルの生成方法。 - 前記学習データとして、x軸、y軸およびz軸の直交3軸のうちz軸方向の解像度がx軸方向およびy軸方向のそれぞれの解像度よりも低いz軸低解像の非等方性3次元データと、
前記y軸方向の解像度が前記z軸方向および前記x軸方向のそれぞれの解像度よりも低いy軸低解像の非等方性3次元データとが用いられ、
前記第1スライス面方向は、前記x軸方向および前記y軸方向に平行なスライス面方向であり、
前記第2スライス面方向は、前記z軸方向および前記x軸方向に平行なスライス面方向である、
請求項7または8に記載の学習済みモデルの生成方法。 - 前記学習データとして、x軸、y軸およびz軸の直交3軸のうちy軸方向の解像度がz軸方向およびx軸方向のそれぞれの解像度よりも低いy軸低解像の非等方性3次元データと、
前記x軸方向の解像度が前記y軸方向および前記z軸方向のそれぞれの解像度よりも低いx軸低解像の非等方性3次元データとが用いられ、
前記第1スライス面方向は、前記z軸方向および前記x軸方向に平行なスライス面方向であり、
前記第2スライス面方向は、前記y軸方向および前記z軸方向に平行なスライス面方向である、
請求項7または8に記載の学習済みモデルの生成方法。 - 前記学習データとして、x軸、y軸およびz軸の直交3軸のうちx軸方向の解像度がy軸方向およびz軸方向のそれぞれの解像度よりも低いx軸低解像の非等方性3次元データと、
前記z軸方向の解像度が前記x軸方向および前記y軸方向のそれぞれの解像度よりも低いz軸低解像の非等方性3次元データとが用いられ、
前記第1スライス面方向は、前記y軸方向および前記z軸方向に平行なスライス面方向であり、
前記第2スライス面方向は、前記x軸方向および前記y軸方向に平行なスライス面方向である、
請求項7または8に記載の学習済みモデルの生成方法。 - 前記コンピュータが、
入力された前記学習データの解像度の条件に応じて、前記第2ドメインの3次元生成画像の真偽判別に使用する前記第1判別器または前記第2判別器を選択的に切り替える、
請求項7から11のいずれか一項に記載の学習済みモデルの生成方法。 - 前記第1撮影条件により撮影された3次元画像として、直交3軸のうちの1軸方向の解像度が他の2軸方向の解像度よりも低い非等方性3次元データが用いられる、
請求項1から12のいずれか一項に記載の学習済みモデルの生成方法。 - 前記コンピュータが、
前記第1撮影条件により撮影された3次元画像を、直交3軸の各軸方向の解像度が等しい等方性3次元データに変換する第1等方化処理を行い、
前記第1等方化処理による変換後の等方性3次元データを前記第1生成器に入力することを含む、
請求項13に記載の学習済みモデルの生成方法。 - 前記第1生成器は、直交3軸の各軸方向の解像度が等しい等方性3次元データの入力を受け付け、前記3次元生成画像としての等方性3次元データを出力する、
請求項1から14のいずれか一項に記載の学習済みモデルの生成方法。 - 前記学習モデルは、さらに、
前記第2ドメインの3次元画像の入力を受け付け、前記第1ドメインの3次元生成画像を出力する3次元畳み込みニューラルネットワークを用いて構成される第2生成器と、
前記第2生成器によって生成された前記第1ドメインの3次元生成画像から切り出される特定のスライス面方向の断面画像を示す2次元画像の入力を受け付け、入力された2次元画像の真偽を判別する2次元畳み込みニューラルネットワークを用いて構成される第3判別器と、を含み、
前記学習処理は、前記第2生成器と前記第3判別器とを敵対的に学習させる処理を含む、
請求項1から15のいずれか一項に記載の学習済みモデルの生成方法。 - 前記コンピュータが、
前記第2生成器により生成された前記第1ドメインの3次元生成画像から前記特定のスライス面方向の断面画像を示す2次元画像を切り出す第3切り出し処理を行い、
前記第3切り出し処理により切り出された2次元画像を前記第3判別器に入力することを含む、
請求項16に記載の学習済みモデルの生成方法。 - 前記コンピュータが、
前記第1生成器から出力された前記第2ドメインの3次元生成画像を前記第2生成器に入力することにより前記第2生成器から出力される第1再構成生成画像に基づき、前記第1生成器および前記第2生成器をこの順に用いた変換処理の第1再構成ロスを計算する処理と、
前記第2生成器から出力された前記第1ドメインの3次元生成画像を前記第1生成器に入力することにより前記第1生成器から出力される第2再構成生成画像に基づき前記第2生成器および前記第1生成器をこの順に用いた変換処理の第2再構成ロスを計算する処理と、を行うことを含む、
請求項16または17に記載の学習済みモデルの生成方法。 - 前記コンピュータが、
前記第1再構成生成画像に対して、前記第1再構成生成画像を生成する際の前記第1生成器への入力に使用した元の前記学習データと同じ解像度の3次元データに変換する第1アベレージプーリング処理を行い、第1アベレージプーリング処理による変換後の3次元データと、前記第1生成器への入力に使用した元の前記学習データとに基づいて前記第1再構成ロスを計算することを含む、
請求項18に記載の学習済みモデルの生成方法。 - 前記コンピュータが、
前記第2再構成生成画像に対して、前記第2再構成生成画像を生成する際の前記第2生成器への入力に使用した元の前記学習データと同じ解像度の3次元データに変換する第2アベレージプーリング処理を行い、前記第2アベレージプーリング処理による変換後の3次元データと、前記第2生成器への入力に使用した元の前記学習データとに基づいて前記第2再構成ロスを計算することを含む、
請求項18または19に記載の学習済みモデルの生成方法。 - 前記学習モデルは、さらに、
前記第2生成器により生成された前記第1ドメインの3次元生成画像から切り出される前記特定のスライス面方向と直交するスライス面方向の断面画像を示す2次元画像の入力を受け付け、入力された2次元画像の真偽を判別する2次元畳み込みニューラルネットワークを用いて構成される第4判別器を含み、
前記学習処理は、前記第2生成器と前記第4判別器とを敵対的に学習させる処理を含む、
請求項16から20のいずれか一項に記載の学習済みモデルの生成方法。 - 前記コンピュータが、
前記第2生成器により生成された前記第1ドメインの3次元生成画像から前記特定のスライス面方向と直交するスライス面方向の断面画像を示す2次元画像を切り出す第4切り出し処理を行い、
前記第4切り出し処理により切り出された2次元画像を前記第4判別器に入力することを含む、
請求項21に記載の学習済みモデルの生成方法。 - 前記特定のスライス面方向は前記第1スライス面方向である、
請求項21または22に記載の学習済みモデルの生成方法。 - 前記コンピュータが、
前記第2撮影条件により撮影された3次元画像を、直交3軸の各軸方向の解像度が等しい等方性3次元データに変換する第2等方化処理を行い、
前記第2等方化処理による変換後の等方性3次元データを前記第2生成器に入力することを含む、
請求項16から23のいずれか一項に記載の学習済みモデルの生成方法。 - 前記第1撮影条件は前記第1ドメインに対応し、前記第2撮影条件は前記第2ドメインに対応している、
請求項1から24のいずれか一項に記載の学習済みモデルの生成方法。 - 前記第1撮影条件により撮影された3次元画像は、医療機器である第1モダリティを用いて撮影された第1モダリティ画像であり、
前記第2撮影条件により撮影された3次元画像は、前記第1モダリティとは異なる種類の医療機器である第2モダリティを用いて撮影された第2モダリティ画像であり、
前記学習モデルは、前記第1モダリティ画像の入力を受けて、前記第2モダリティを用いて撮影された画像の特徴を持つ擬似的な第2モダリティ生成画像を生成するように学習が行われる、
請求項25に記載の学習済みモデルの生成方法。 - 前記第1ドメインは第1解像度、前記第2ドメインは前記第1解像度よりも高解像の第2解像度であり、
請求項1から24のいずれか一項に記載の学習済みモデルの生成方法。 - 前記第1撮影条件により撮影された3次元画像は、直交3軸のうち第1軸方向の解像度が他の2軸方向のそれぞれの解像度よりも低い第1軸低解像3次元データであり、
前記第2撮影条件により撮影された3次元画像は、直交3軸のうち前記第1軸方向とは異なる第2軸方向の解像度が他の2軸方向の解像度よりも低い第2軸低解像3次元データであり、
前記学習モデルは、前記第1軸低解像3次元データおよび前記第2軸低解像3次元データの少なくとも1つの入力を受けて、入力された3次元データよりも高解像度の等方性3次元データを生成するように学習が行われる、
請求項27に記載の学習済みモデルの生成方法。 - 前記コンピュータが、
前記第1生成器により生成された前記第1ドメインの3次元生成画像に対して解像度を低下させる低解像化処理を行い、
前記低解像化処理によって得られた再構成生成画像に基づいて、前記第1生成器による超解像処理と前記低解像化処理とによる画像変換の再構成ロスを計算することを含む、
請求項27または28に記載の学習済みモデルの生成方法。 - 入力された3次元画像のドメインを変換して異なるドメインの3次元生成画像を生成する学習モデルを訓練する機械学習システムであって、
少なくとも1つの第1プロセッサと、
前記少なくとも1つの第1プロセッサによって実行されるプログラムが記憶される少なくとも1つの第1記憶装置と、を備え、
前記学習モデルは、
第1ドメインの3次元画像の入力を受け付け、前記第1ドメインとは異なる第2ドメインの3次元生成画像を出力する3次元畳み込みニューラルネットワークを用いて構成される第1生成器と、
前記第1生成器によって生成された前記第2ドメインの3次元生成画像から切り出される第1スライス面方向の断面画像を示す2次元画像の入力を受け付け、入力された2次元画像の真偽を判別する2次元畳み込みニューラルネットワークを用いて構成される第1判別器とを含む敵対的生成ネットワークの構造を有し、
前記少なくとも1つのプロセッサは、前記プログラムの命令を実行することにより、
第1撮影条件により撮影された3次元画像と、前記第1撮影条件とは異なる第2撮影条件により撮影された3次元画像とを含む複数の学習データを取得し、
前記複数の学習データに基づき、前記第1生成器と前記第1判別器とを敵対的に学習させる学習処理を行う、
機械学習システム。 - コンピュータに、
入力された3次元画像のドメインを変換して異なるドメインの3次元生成画像を生成する学習モデルを訓練する処理を実行させるプログラムであって、
前記学習モデルは、
第1ドメインの3次元画像の入力を受け付け、前記第1ドメインとは異なる第2ドメインの3次元生成画像を出力する3次元畳み込みニューラルネットワークを用いて構成される第1生成器と、
前記第1生成器によって生成された前記第2ドメインの3次元生成画像から切り出される第1スライス面方向の断面画像を示す2次元画像の入力を受け付け、入力された2次元画像の真偽を判別する2次元畳み込みニューラルネットワークを用いて構成される第1判別器とを含む敵対的生成ネットワークの構造を有し、
前記コンピュータに、
第1撮影条件により撮影された3次元画像と、前記第1撮影条件とは異なる第2撮影条件により撮影された3次元画像とを含む複数の学習データを取得させ、
前記複数の学習データに基づき、前記第1生成器と前記第1判別器とを敵対的に学習させる学習処理を実行させる、
プログラム。 - 非一時的かつコンピュータ読取可能な記録媒体であって、請求項31に記載のプログラムが記録された記録媒体。
- 請求項1から29のいずれか一項に記載の学習済みモデルの生成方法を実施することにより訓練された学習済みの前記第1生成器である第1学習済みモデルを記憶する第2記憶装置と、
前記第1学習済みモデルを用いて画像処理を行う第2プロセッサと、を備え、
前記第1学習済みモデルは、
第1医療画像の入力を受けて、前記第1医療画像とは異なるドメインの第2医療画像を出力するように訓練されたモデルである、
医療画像処理装置。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022578244A JPWO2022163402A1 (ja) | 2021-01-26 | 2022-01-17 | |
EP22745625.8A EP4287114A1 (en) | 2021-01-26 | 2022-01-17 | Learned model generation method, machine learning system, program, and medical image processing device |
US18/357,986 US20230368442A1 (en) | 2021-01-26 | 2023-07-24 | Method of generating trained model, machine learning system, program, and medical image processing apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021010459 | 2021-01-26 | ||
JP2021-010459 | 2021-05-19 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/357,986 Continuation US20230368442A1 (en) | 2021-01-26 | 2023-07-24 | Method of generating trained model, machine learning system, program, and medical image processing apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022163402A1 true WO2022163402A1 (ja) | 2022-08-04 |
Family
ID=82654658
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/001351 WO2022163402A1 (ja) | 2021-01-26 | 2022-01-17 | 学習済みモデルの生成方法、機械学習システム、プログラムおよび医療画像処理装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230368442A1 (ja) |
EP (1) | EP4287114A1 (ja) |
JP (1) | JPWO2022163402A1 (ja) |
WO (1) | WO2022163402A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117745725B (zh) * | 2024-02-20 | 2024-05-14 | 阿里巴巴达摩院(杭州)科技有限公司 | 图像处理方法、图像处理模型训练方法、三维医学图像处理方法、计算设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018505705A (ja) * | 2014-12-10 | 2018-03-01 | コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. | 機械学習を用いた医用イメージングの変換のためのシステムおよび方法 |
JP2019149094A (ja) | 2018-02-28 | 2019-09-05 | 富士フイルム株式会社 | 診断支援システム、診断支援方法、及びプログラム |
JP6583875B1 (ja) | 2019-06-20 | 2019-10-02 | Psp株式会社 | 画像処理方法、画像処理システム及び画像処理プログラム |
JP2019534763A (ja) * | 2016-09-06 | 2019-12-05 | エレクタ、インク.Elekta, Inc. | 合成医療画像を生成するためのニューラルネットワーク |
JP2020054579A (ja) * | 2018-10-01 | 2020-04-09 | 富士フイルム株式会社 | 疾患領域抽出装置、方法及びプログラム |
-
2022
- 2022-01-17 EP EP22745625.8A patent/EP4287114A1/en active Pending
- 2022-01-17 JP JP2022578244A patent/JPWO2022163402A1/ja active Pending
- 2022-01-17 WO PCT/JP2022/001351 patent/WO2022163402A1/ja active Application Filing
-
2023
- 2023-07-24 US US18/357,986 patent/US20230368442A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018505705A (ja) * | 2014-12-10 | 2018-03-01 | コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. | 機械学習を用いた医用イメージングの変換のためのシステムおよび方法 |
JP2019534763A (ja) * | 2016-09-06 | 2019-12-05 | エレクタ、インク.Elekta, Inc. | 合成医療画像を生成するためのニューラルネットワーク |
JP2019149094A (ja) | 2018-02-28 | 2019-09-05 | 富士フイルム株式会社 | 診断支援システム、診断支援方法、及びプログラム |
JP2020054579A (ja) * | 2018-10-01 | 2020-04-09 | 富士フイルム株式会社 | 疾患領域抽出装置、方法及びプログラム |
JP6583875B1 (ja) | 2019-06-20 | 2019-10-02 | Psp株式会社 | 画像処理方法、画像処理システム及び画像処理プログラム |
Non-Patent Citations (1)
Title |
---|
YAMASOBA CHIKATO, TOZAKI TETSUYA, SENDA MICHIO: "Generation and Evaluation of Another Modality of Medical Images Based on GAN", 2020 INFORMATION SCIENCE AND TECHNOLOGY FORUM (FIT), IEICE, JP, 18 August 2020 (2020-08-18) - 3 September 2020 (2020-09-03), JP, pages 277 - 278, XP055954730 * |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022163402A1 (ja) | 2022-08-04 |
US20230368442A1 (en) | 2023-11-16 |
EP4287114A1 (en) | 2023-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Beers et al. | High-resolution medical image synthesis using progressively grown generative adversarial networks | |
Du et al. | Super-resolution reconstruction of single anisotropic 3D MR images using residual convolutional neural network | |
JP7246866B2 (ja) | 医用画像処理装置 | |
Liang et al. | Sketch guided and progressive growing GAN for realistic and editable ultrasound image synthesis | |
Gerig et al. | Symbolic description of 3-D structures applied to cerebral vessel tree obtained from MR angiography volume data | |
CN109978037B (zh) | 图像处理方法、模型训练方法、装置、和存储介质 | |
CN109754394B (zh) | 三维医学图像处理装置及方法 | |
CN111368849B (zh) | 图像处理方法、装置、电子设备及存储介质 | |
CN111429474B (zh) | 基于混合卷积的乳腺dce-mri图像病灶分割模型建立及分割方法 | |
KR20200130374A (ko) | 두꺼운 이미지 슬라이스들로부터 얇은 이미지 슬라이스들을 생성하기 위한 시스템들 및 방법들 | |
AlZu'bi et al. | Transferable hmm trained matrices for accelerating statistical segmentation time | |
Du et al. | Accelerated super-resolution MR image reconstruction via a 3D densely connected deep convolutional neural network | |
Feng et al. | Brain MRI super-resolution using coupled-projection residual network | |
Chen et al. | Generative adversarial U-Net for domain-free medical image augmentation | |
JP7423338B2 (ja) | 画像処理装置及び画像処理方法 | |
US20230368442A1 (en) | Method of generating trained model, machine learning system, program, and medical image processing apparatus | |
JP2022077991A (ja) | 医用画像処理装置、医用画像処理方法、医用画像処理プログラム、モデルトレーニング装置、およびトレーニング方法 | |
Angermann et al. | Projection-based 2.5 d u-net architecture for fast volumetric segmentation | |
US20240005498A1 (en) | Method of generating trained model, machine learning system, program, and medical image processing apparatus | |
WO2020175445A1 (ja) | 学習方法、学習装置、生成モデル及びプログラム | |
Iddrisu et al. | 3D reconstructions of brain from MRI scans using neural radiance fields | |
Karthik et al. | Automatic quality enhancement of medical diagnostic scans with deep neural image super-resolution models | |
Ma et al. | A frequency domain constraint for synthetic and real x-ray image super resolution | |
CN114049334A (zh) | 一种以ct图像为输入的超分辨率mr成像方法 | |
Feng et al. | Coupled-projection residual network for mri super-resolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22745625 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022578244 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022745625 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022745625 Country of ref document: EP Effective date: 20230828 |