US20190371018A1 - Method for processing sparse-view computed tomography image using neural network and apparatus therefor - Google Patents

Method for processing sparse-view computed tomography image using neural network and apparatus therefor Download PDF

Info

Publication number
US20190371018A1
US20190371018A1 US16/365,498 US201916365498A US2019371018A1 US 20190371018 A1 US20190371018 A1 US 20190371018A1 US 201916365498 A US201916365498 A US 201916365498A US 2019371018 A1 US2019371018 A1 US 2019371018A1
Authority
US
United States
Prior art keywords
neural network
equation
frame
sparse
image processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US16/365,498
Other versions
US10991132B2 (en
Inventor
JongChul YE
YoSeob HAN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Korea Advanced Institute of Science and Technology KAIST
Original Assignee
Korea Advanced Institute of Science and Technology KAIST
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Korea Advanced Institute of Science and Technology KAIST filed Critical Korea Advanced Institute of Science and Technology KAIST
Assigned to KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY reassignment KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HAN, YOSEOB, YE, JONGCHUL
Publication of US20190371018A1 publication Critical patent/US20190371018A1/en
Application granted granted Critical
Publication of US10991132B2 publication Critical patent/US10991132B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/003Reconstruction from projections, e.g. tomography
    • G06T11/006Inverse problem, transformation from projection-space into object-space, e.g. transform methods, back-projection, algebraic methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/003Reconstruction from projections, e.g. tomography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/003Reconstruction from projections, e.g. tomography
    • G06T11/008Specific post-processing after tomographic reconstruction, e.g. voxelisation, metal artifact correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20064Wavelet transform [DWT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2211/00Image generation
    • G06T2211/40Computed tomography
    • G06T2211/436Limited angle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2211/00Image generation
    • G06T2211/40Computed tomography
    • G06T2211/441AI-based methods, deep learning or artificial neural networks

Definitions

  • Embodiments of the inventive concept described herein relate to a method for processing images using a neural network and an apparatus therefor, and more particularly, relate to a method for reconstructing a sparse-view computed tomography (CT) image as a high-quality image using a neural network for a learning model satisfying a predetermined condition and an apparatus therefor.
  • CT computed tomography
  • CT which is an imaging technique for obtaining CT images of objects obtains X-rays attenuated after transmitting X-rays to objects and reconstructs CT images using the obtained X-rays. Because the CT uses X-rays, the exposure of radiation is emerging as a major issue.
  • Various researches have been conducted to solve the above problems. There is low-dose CT for reducing the intensity of X-rays, interior tomography for irradiating X-rays to only local areas to generate CT images, or the like. Furthermore, there is sparse-view CT for reducing the number of photographed X-rays as the method for reducing the X-ray dose.
  • the sparse-view CT is a method that lowers the radiation dose by reducing the number of projection views. While the sparse view CT may not be useful for existing multi-detector CT (MDCT) due to the fast and continuous acquisition of projection views, there are many new applications of sparse-view CT such as spectral CT using alternating kVp switching, dynamic beam blocker, and the like. Moreover, in C-arm CT or dental CT applications, the scan time is limited primarily by the relative slow speed of the plat-panel detector, rather than the mechanical gantry speed, so sparse-view CT gives an opportunity to reduce the scan time.
  • Embodiments of the inventive concept provide a method for reconstructing a sparse-view CT image as a high-quality image using a neural network for a learning model satisfying a predetermined frame condition and an apparatus therefor.
  • an image processing method may include receiving a sparse-view computed tomography (CT) data and reconstructing an image for the sparse-view CT data using a neural network of a learning model satisfying a predetermined frame condition.
  • CT computed tomography
  • the reconstructing of the image may include reconstructing the image for the sparse-view CT data using the neural network of the learning model which satisfies the frame condition and is learned by residual learning.
  • the neural network may include a neural network which generates the learning model satisfying the frame condition through a mathematical analysis based on convolutional framelets and is learned by the learning model.
  • the neural network may include a multi-resolution neural network including pooling and unpooling layers.
  • the neural network may include a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing a mathematical expression of the multi-resolution neural network as a dual frame.
  • the neural network may include a by-pass connection from the pooling layer to the unpooling layer.
  • an image processing method may include receiving a sparse-view CT data and reconstructing an image for the sparse-view CT data using a neural network for a learning model which satisfies a predetermined frame condition and is based on convolutional framelets.
  • the neural network may include a multi-resolution neural network including pooling and unpooling layers.
  • the neural network may include a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing a mathematical expression of the multi-resolution neural network as a dual frame.
  • an image processing device may include a reception unit configured to receive a sparse-view CT data and a reconstruction unit configured to reconstruct an image for the sparse-view CT data using a neural network of a learning model satisfying a predetermined frame condition.
  • the reconstruction unit may be configured to reconstruct the image for the sparse-view CT data using the neural network of the learning model which satisfies the frame condition and is learned by residual learning.
  • the neural network may include a neural network which generates the learning model satisfying the frame condition through a mathematical analysis based on convolutional framelets and is learned by the learning model.
  • the neural network may include a multi-resolution neural network including pooling and unpooling layers.
  • the neural network may include a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing a mathematical expression of the multi-resolution neural network as a dual frame.
  • the neural network may include a by-pass connection from the pooling layer to the unpooling layer.
  • FIG. 1 are drawings illustrating an example of CT streaking artifact patterns in the reconstruction images from 48 projection views
  • FIG. 2 are drawings illustrating an example of comparing sizes of receptive fields according to a structure of a network or a neural network
  • FIG. 3 is an operational flowchart illustrating an image processing method according to an embodiment of the inventive concept
  • FIG. 4 are drawings illustrating a simplified U-Net architecture, a dual frame U-Net architecture, and a tight frame U-Net architecture;
  • FIGS. 5A, 5B, and 5C are drawings illustrating a standard U-Net architecture, a dual frame U-Net architecture, and a tight frame U-Net architecture;
  • FIG. 6 are drawings illustrating an example of reconstruction results by general, dual frame, and tight frame U-Nets at various sparse view reconstruction
  • FIG. 7 is a block diagram illustrating a configuration of an image processing device according to an embodiment of the inventive concept.
  • the convolutional framelet may be represented for the input signal f using the local basis ⁇ j and the non-local basis ⁇ i and may be represented as Equation 1 below.
  • ⁇ i denotes the linear transform operator with the non-local basis vector
  • ⁇ j denotes the linear transform operator with the local basis vector
  • the local basis vector and the non-local basis vector may have the dual basis vectors ⁇ tilde over ( ⁇ ) ⁇ i and ⁇ tilde over ( ⁇ ) ⁇ j , respectively, which are orthogonal to each other.
  • the orthogonal relationship between the basis vectors may be defined as Equation 2 below.
  • Equation 3 the convolutional framelet may be represented as Equation 3 below.
  • d denotes the Hankel matrix operator, which may allow the convolutional operation to be represented as the matrix multiplication
  • C denotes the convolutional framelet coefficient which is the signal transformed by the local basis and the non-local basis.
  • the convolutional framelet coefficient C may be reconstructed as the original signal by applying the dual basis vectors ⁇ tilde over ( ⁇ ) ⁇ i , ⁇ tilde over ( ⁇ ) ⁇ j .
  • the reconstruction process may be represented as Equation 4 below.
  • the technique of representing the input signal through the local basis and the non-local basis may be the convolutional framelet.
  • One of the key ingredients for the deep convolutional framelets may be the frame condition for the non-local basis.
  • the existing neural network architecture for example, the U-Net architecture, does not satisfy the frame condition and it overly emphasizes the low frequency component of the signal.
  • this artifact may be manifested as blurring artifacts in the reconstructed images.
  • An embodiment of the inventive concept may provide two types of novel network architectures that satisfy the frame condition and may provide a dual frame network and a tight frame network.
  • the dual frame network may be a by-pass connection in the low-resolution path to generate a residual signal.
  • the tight frame network with orthogonal wavelet basis for example, Haar wavelet basis may be implemented by adding the high frequency path to the existing U-Net structure.
  • R(A) denotes the range space of A
  • P R(A) denotes the projection to the range space of A.
  • the identity matrix is referred to as I.
  • the notation A t refers to the generalized inverse matrix.
  • the superscript T of A T denotes the Hermitian transpose.
  • a vector ⁇ ⁇ n refers to the flipped version of a vector ⁇ n , i.e. its indices are reversed.
  • an embodiment of the inventive concept may define ⁇ as Equation 5 below.
  • a family of functions ⁇ k ⁇ k ⁇ in a Hilbert space H is called a frame when it satisfies the following inequality of Equation 6 below.
  • the frame bounds may be represented by Equation 8 below.
  • ⁇ min (A) and ⁇ max (A) denote the minimum and maximum singular values of A, respectively.
  • the explicit form of the dual frame may be given by the pseudo-inverse as Equation 10 below.
  • the noise amplification factor may be computed by Equation 11 below.
  • ⁇ ( ⁇ ) refers to the condition number
  • an embodiment of the inventive concept may be mainly derived using the circular convolution.
  • an embodiment of the inventive concept may consider 1-D signal processing, but the extension to 2-D signal processing may be straightforward.
  • Equation 13 a wrap-around Hankel matrix d (f) may be defined by Equation 13 below.
  • ⁇ d ⁇ ( f ) [ f ⁇ [ 1 ] f ⁇ [ 2 ] ... f ⁇ [ d ] f ⁇ [ 2 ] f ⁇ [ 3 ] ... f ⁇ [ d + 1 ] ⁇ ⁇ ⁇ ⁇ f ⁇ [ n ] f ⁇ [ 1 ] ... f ⁇ [ d - 1 ] ] [ Equation ⁇ ⁇ 13 ]
  • d denotes the matrix pencil parameter
  • Equation 14 When a multi-channel signal is given as Equation 14 below, an extended Hankel matrix may be constructed by stacking Hankel matrices.
  • the extended Hankel matrix may be represented as Equation 15 below.
  • a single-input single-output (SISO) convolution in CNN may be represented as Equation 16 below using a Hankel matrix.
  • q denotes the number of output channels.
  • a multi-input multi-output (MIMO) convolution in CNN may be represented by Equation 18 below.
  • p and q refer to the number of input and output channels, respectively.
  • the j-th input channel filter may be represented as Equation 19 below.
  • the extension to the multi-channel 2-D convolution operation for an image domain CNN may be straightforward, since similar matrix vector operations may also be used. That is, the extension to the multi-channel 2-D convolution operation for the image domain CNN may be changed in only the definition of the Hankel matrices, which is defined as block Hankel matrix, or the extended Hankel matrices.
  • Hankel matrix One of the most interesting properties of the Hankel matrix is that it often has a low-rank structure and its low-rankness is related to the sparsity in the Fourier domain. This property is extremely useful, as evidenced by their applications for many inverse problems and low-level computer vision problems.
  • An embodiment of the inventive concept may briefly review the theory of deep convolutional framelets. Using the existing Hankel matrix approaches, an embodiment of the inventive concept may consider the following regression problem as Equation 20 below.
  • f* ⁇ d denotes the ground-truth signal.
  • Equation 20 The classical approach to address the problem of Equation 20 above is to use singular value shrinkage or matrix factorization. However, in deep convolutional framelets, the problem is addressed using learning-based signal representation.
  • ( ⁇ ij ) ⁇ r ⁇ r is the diagonal matrix with singular values.
  • An embodiment of the inventive concept may consider the matrix pairs ⁇ , ⁇ tilde over ( ⁇ ) ⁇ n ⁇ n satisfying the frame condition represented in Equation 21 below.
  • bases are referred to as non-local bases since they interact with all the n-elements of f ⁇ n by multiplying them to the left of d (f) ⁇ n ⁇ d .
  • An embodiment of the inventive concept may need another matrix pair ⁇ , ⁇ tilde over ( ⁇ ) ⁇ d ⁇ r satisfying the low dimensional subspace constraint represented in Equation 22 below.
  • Equation 23 an embodiment of the inventive concept may obtain the Hankel matrix reconstruction operator for the input signal f as Equation 23 below.
  • d denotes the Hankel matrix reconstruction operator.
  • Equation 24 Factorizing ⁇ T d (f) ⁇ from Equation 23 above results in the decomposition of f using a single layer encoder-decoder architecture as Equation 24 below.
  • Equation 25 the encoder and decoder convolution filters are given by Equation 25 below.
  • Equation 24 above is the general form of the signals that are associated with a rank-r Hankel structured matrix, and an embodiment of the inventive concept is interested in specifying bases for optimal performance.
  • ⁇ and ⁇ tilde over ( ⁇ ) ⁇ may correspond to the user-defined generalized pooling and unpooling to satisfy the frame condition of Equation 21 above.
  • the filters ⁇ , ⁇ tilde over ( ⁇ ) ⁇ need to be estimated from the data.
  • an embodiment of the inventive concept may consider 0 , which consists of signals that have positive framelet coefficients. 0 that consists of the signals that have the positive framelet coefficients may be represented as Equation 16 below.
  • Equation 27 may be represented as Equation 28 below.
  • Equation 30 C may be represented as Equation 30 below.
  • ⁇ ( ⁇ ) refers to the rectified linear unit (ReLU) to impose the positivity for the framelet coefficients.
  • the input image f (i) of sparse-view CT may be represented as Equation 31 below.
  • h (i) denotes the streaking artifacts and f* (i) refers to the artifact-free ground-truth image.
  • Equation 32 the residual network training may be formulated as Equation 32 below.
  • the residual learning scheme is to find the filter ⁇ which approximately annihilates the true signal f* (i) as Equation 33 below.
  • the signal decomposition using deep convolutional framelets may be applied for the streaking artifact signal as Equation 34 below.
  • Equation 35 may come from Equation 35 below thanks to the annihilating property of Equation 29 above.
  • the neural network is trained to learn the structure of the true image to annihilate the artifact signals, but still to retain the artifact signals.
  • Equation 36 Equation 36
  • Equation 37 the l-th layer encoder and decoder filters may be defined by Equations 38 and 39 below.
  • d (l) , p (l) , q (l) denote the filter length, and the number of input and output channels, respectively.
  • an embodiment of the inventive concept may obtain the deep convolution framelet extension and the associated training scheme.
  • the non-local bases ⁇ T and ⁇ tilde over ( ⁇ ) ⁇ correspond to the generalized pooling and unpooling operations, while the local bases ⁇ and ⁇ tilde over ( ⁇ ) ⁇ work as learnable convolutional filters.
  • the frame condition may be the most important prerequisite for enabling the recovery condition and controllable shrinkage behavior, which is the main criterion for constructing the U-Net variants.
  • FIG. 3 is an operational flowchart illustrating an image processing method according to an embodiment of the inventive concept.
  • the image processing method may include receiving (S 310 ) sparse-view CT data and reconstructing (S 320 ) an image for the sparse-view CT data using a neural network of a learning model satisfying a predetermined frame condition.
  • operation S 320 may be to reconstruct the image the sparse-view CT data using the neural network of the learning model which satisfies the frame condition and is learned by residual learning.
  • the neural network used in an embodiment of the inventive concept may include a neural network which generates a learning model which satisfies the frame condition through a mathematical analysis based on the convolutional framelets and is learned by the learning model and may include a multi-resolution neural network including pooling and unpooling layers.
  • the neural network may include a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing the mathematical expression of the multi-resolution neural network as the dual frame.
  • the neural network may include a by-pass connection from the pooling layer to the unpooling layer.
  • FIG. 1 are drawings illustrating an example of CT streaking artifact patterns in the reconstruction images from 48 projection views.
  • (a) and (b) of FIG. 1 show two reconstruction images and their artifact-only images when only 48 projection views are available.
  • U-Net Convolutional networks for biomedical image segmentation
  • FIG. 2 are drawings illustrating an example of comparing sizes of receptive fields according to a structure of a network or a neural network.
  • (a) and (b) of FIG. 2 compare the size of the receptive field of a multi-resolution network U-Net ((b) of FIG. 2 ) and a single resolution CNN ((a) of FIG. 2 ) without pooling layers.
  • the receptive field becomes larger and larger as it passes through the convolution layer Cony.
  • the rate of increase is not high, it should have the very deep neural network architecture to include the entire area.
  • the neural network architecture that has a large receptive field on the neural network may be mainly used as the multi-resolution architecture, for example, the U-Net architecture, when performing the segmentation in the image.
  • FIG. 4 shows a simplified U-Net architecture.
  • U-Net play a role in delivering the signal of the input unit to the output unit, by using the average pooling layer and the average unpooling layer as the non-local bases and through the by-pass connection layer expressed by a dotted line.
  • the U-Net is recursively applied to the low-resolution signal.
  • the input f ⁇ n is first filtered with the local convolutional filter ⁇ , which is then reduced to a half size approximate signal using a pooling operation ⁇ .
  • this step may be represented by Equation 40 below.
  • f ⁇ denotes the multi-channel convolution in CNN.
  • ⁇ T denotes a pooling operator, and the pooling operator is given by Equation 41 below.
  • the U-Net has the by-pass connection to compensate for the lost high frequency during pooling.
  • the convolutional framelet coefficients may be represented by Equation 42 below.
  • ⁇ ext T refers to the extended pooling
  • B refers to the bypass component
  • S refers to the low pass subband.
  • Equation 43 B and S may be represented as Equation 44 below.
  • Equation 45 may be derived using the above-mentioned equations.
  • ⁇ T P R( ⁇ ) for the case of average pooling.
  • the blurred reconstruction signal may be generated.
  • Equation 46 the dual frame for ⁇ ext in Equation 43 above may be obtained as Equation 46 below.
  • Equation 47 an embodiment of the inventive concept may obtain Equation 47 below.
  • the dual frame may be given by Equation 48 below.
  • Equation 49 For a given framelet coefficient C ext in Equation 42 above, the reconstruction using the dual frame may be given by Equation 49 below.
  • the final step of dual frame U-Net is the concatenation and the multi-channel convolution, which is equivalent to applying the inverse Hankel operation, i.e., d t ( ⁇ ), to the processed framelet coefficients multiplied with the local basis.
  • the concatenated signal may be given by Equation 50 above.
  • the final convolution may be equivalently computed by Equation 50 below.
  • the non-local basis ⁇ T may be composed of filter bank as represented in Equation 52 below.
  • T k denotes the k-th subband operator.
  • Equation 54 The convolutional framelet coefficients including a by-pass connection may be written by Equation 54 below.
  • ⁇ ext [I T 1 . . . T L ] T
  • B f ⁇
  • S k T k T C.
  • ⁇ ext is also a tight frame by Equation 55 below.
  • T 1 is the low-pass subband, which is equivalent to the average pooling in Equation 37 above and T 2 is the high-pass subband filter.
  • T 2 is the high-pass subband filter.
  • the high-pass filtering T 2 may be given by Equation 56 below.
  • T 2 1 2 ⁇ [ 1 - 1 0 0 ... 0 0 0 0 1 - 1 ... 0 ⁇ ⁇ ⁇ 0 0 0 0 ... 1 - 1 ] ⁇ [ Equation ⁇ ⁇ 56 ]
  • T 1 T 1 T +T 2 T 2 T I; so the Haar wavelet frame is tight.
  • T 1 T 1 T +T 2 T 2 T I; so the Haar wavelet frame is tight.
  • the corresponding tight frame U-Net structure as illustrated in (c) of FIG. 4 , in contrast to the U-Net structure in (a) of FIG. 4 , there is an additional high-pass branch.
  • each subband signal is by-passed to the individual concatenation layers.
  • the convolutional layer after the concatenation layers may provide weighted sum of which weights are learned from data. This simple fix makes the frame tight.
  • the neural network of the tight frame U-Net structure shown in (c) of FIG. 4 may be expressed as the neural network which has the tight filter-bank or the wavelet as the non-local basis to satisfy the convolutional frame.
  • the non-local basis of the tight frame U-Net may satisfy the tight frame.
  • the nonlinear operation may restrict the sparsity of the signal for various input signals f or may restrict the positivity of the signal. This may enable the neural network to learn various input signals or the transform signal. This may enable local and non-local basis vectors of the linear transform operation to find various solutions. Moreover, the nonlinear operation may construct the nonlinear operation in the form of satisfying the reconstruction condition.
  • the residual learning is applicable to the neural network to enhance the learning effect.
  • the residual learning may make the local basis of the linear transform lower rank, so the unnecessary load of the neural network may be greatly reduced.
  • Such an internal by-pass connection or an external by-pass connection may overcome the difficulty of the deep network training to improve the performance of removing the local noise and the non-local noise.
  • FIGS. 5A to 5C are drawings illustrating a standard U-Net architecture ( 5 A), a dual frame U-Net architecture ( 5 B), and a tight frame U-Net architecture ( 5 C).
  • each network may include a convolution layer for performing the linear transform operation, a batch normalization layer for performing the normalization operation, a rectified linear unit (ReLU) layer for performing the nonlinear function operation, and a path connection with concatenation.
  • each stage may include four sequential layers composed of convolution with 3 ⁇ 3 kernels, batch normalization, and ReLU layers.
  • the last stage may include two sequential layers and the last layer.
  • the last layer may include only the convolution layer with 1 ⁇ 1 kernel.
  • the number of channels for each convolution layer is illustrated in FIGS. 5A to 5C .
  • the number of channels may be doubled after each pooling layers.
  • the differences between the U-Net and the dual frame U-Net or the tight frame U-Net are from the pooling and unpooling layers.
  • the standard U-Net may show the limitation of the reconstruction level from the signal point of view.
  • An embodiment of the inventive concept may mathematically prove the limit of the existing U-Net structure and may formulate the theory capable of overcoming the limit based on it, thus providing the dual frame U-Net and the tight frame U-Net which are the neural network architecture satisfying the frame condition.
  • the dual frame U-Net may express the mathematical expression of the U-Net as the dual frame to be the structured neural network architecture and may be the neural network architecture proposed to have the similar amount of computation to the U-Net structure and satisfy the frame condition by adding the residual path concurrently with maintaining the general U-Net structure.
  • the tight frame U-Net may decompose the low-frequency domain and the high-frequency domain using the wavelet.
  • the low-frequency domain may be decomposed stage by stage to be the same as the operation performed in the general U-Net structure, while the high-frequency domain may be reconstructed without losing the high-frequency signal, by designing the path to pass to the opposite layer.
  • FIG. 6 are drawings illustrating an example of reconstruction results by general, dual frame, and tight frame U-Nets at various sparse view reconstruction.
  • the left box in each image region illustrates the enlarged images, and the right box illustrates the difference images.
  • the number written to the images is the normalized mean square error (NMSE) value.
  • NMSE normalized mean square error
  • the U-Net produces blurred edge images in many areas, while the dual frame and tight frame U-Nets may enhance the high frequency characteristics of the images.
  • the dual frame U-Net and the tight frame U-Net may reduce the phenomenon in which the images are crushed, which is the limit of the general U-Net, and may reconstruct all sparse-view images using the single neural network without the correction of additional parameters by simultaneously learning various sparse-view images.
  • FIG. 7 is a block diagram illustrating a configuration of an image processing device according to an embodiment of the inventive concept and illustrates a configuration of a device which performs the methods in FIGS. 3 to 6 .
  • an image processing device 700 may include a reception unit 710 and a reconstruction unit 720 .
  • the reception unit 710 may receive sparse-view CT data.
  • the reconstruction unit 720 may reconstruct an image for the sparse-view CT data using a neural network for a learning model which satisfies a predetermined frame condition and is based on the convolutional framelets.
  • the reconstruction unit 720 may reconstruct the image for the sparse-view CT data using the neural network of the learning model which satisfies the frame condition and is learned by the residual learning.
  • the neural network used in the device may include a neural network which generates a learning model satisfying the frame condition through a mathematical analysis based on the convolutional framelets and is learned by the learning model and may include a multi-resolution neural network including pooling and unpooling layers.
  • the multi-resolution neural network may include a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing the mathematical expression of the multi-resolution neural network as the dual frame.
  • the neural network may include a by-pass connection from the pooling layer to the unpooling layer.
  • the foregoing devices may be realized by hardware elements, software elements and/or combinations thereof.
  • the devices and components illustrated in the exemplary embodiments of the inventive concept may be implemented in one or more general-use computers or special-purpose computers, such as a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), a programmable logic unit (PLU), a microprocessor or any device which may execute instructions and respond.
  • a processing unit may implement an operating system (OS) or one or software applications running on the OS. Further, the processing unit may access, store, manipulate, process and generate data in response to execution of software.
  • OS operating system
  • the processing unit may access, store, manipulate, process and generate data in response to execution of software.
  • the processing unit may include a plurality of processing elements and/or a plurality of types of processing elements.
  • the processing unit may include a plurality of processors or one processor and one controller.
  • the processing unit may have a different processing configuration, such as a parallel processor.
  • Software may include computer programs, codes, instructions or one or more combinations thereof and may configure a processing unit to operate in a desired manner or may independently or collectively control the processing unit.
  • Software and/or data may be permanently or temporarily embodied in any type of machine, components, physical equipment, virtual equipment, computer storage media or units or transmitted signal waves so as to be interpreted by the processing unit or to provide instructions or data to the processing unit.
  • Software may be dispersed throughout computer systems connected via networks and may be stored or executed in a dispersion manner.
  • Software and data may be recorded in one or more computer-readable storage media.
  • the methods according to the above-described exemplary embodiments of the inventive concept may be implemented with program instructions which may be executed through various computer means and may be recorded in computer-readable media.
  • the media may also include, alone or in combination with the program instructions, data files, data structures, and the like.
  • the program instructions recorded in the media may be designed and configured specially for the exemplary embodiments of the inventive concept or be known and available to those skilled in computer software.
  • Computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as compact disc-read only memory (CD-ROM) disks and digital versatile discs (DVDs); magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like.
  • Program instructions include both machine codes, such as produced by a compiler, and higher level codes that may be executed by the computer using an interpreter.
  • the described hardware devices may be configured to act as one or more software modules to perform the operations of the above-described exemplary embodiments of the inventive concept, or vice versa.
  • the image processing device may reconstruct a sparse-view CT image as a high-quality image using the neural network for the learning model satisfying the predetermined frame condition.
  • the image processing device may reconstruct a high-quality image while having the similar amount of calculation to the existing neural network architecture by mathematically proving the limit of the existing multi-resolution neural network, for example, the U-Net structure, formulating the theory capable of overcoming the limit based on it, providing the neural network satisfying the frame condition, and reconstructing the sparse-view CT image by means of the neural network.

Abstract

A method for processing a sparse-view computed tomography (CT) image using a neural network and an apparatus therefor are provided. The method includes receiving a sparse-view CT data and reconstructing an image for the sparse-view CT data using a neural network of a learning model satisfying a predetermined frame condition.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) [2016-0-00562(R0124-16-0002), Emotional Intelligence Technology to Infer Human Emotion and Carry on Dialogue Accordingly]. This application claims priority under 35 U.S.C. § 119 to Korean Patent Application No. 10-2018-0060849 filed on May 29, 2018, in the Korean Intellectual Property Office, the disclosures of which are incorporated by reference herein in their entireties.
  • BACKGROUND
  • Embodiments of the inventive concept described herein relate to a method for processing images using a neural network and an apparatus therefor, and more particularly, relate to a method for reconstructing a sparse-view computed tomography (CT) image as a high-quality image using a neural network for a learning model satisfying a predetermined condition and an apparatus therefor.
  • CT which is an imaging technique for obtaining CT images of objects obtains X-rays attenuated after transmitting X-rays to objects and reconstructs CT images using the obtained X-rays. Because the CT uses X-rays, the exposure of radiation is emerging as a major issue. Various researches have been conducted to solve the above problems. There is low-dose CT for reducing the intensity of X-rays, interior tomography for irradiating X-rays to only local areas to generate CT images, or the like. Furthermore, there is sparse-view CT for reducing the number of photographed X-rays as the method for reducing the X-ray dose.
  • The sparse-view CT is a method that lowers the radiation dose by reducing the number of projection views. While the sparse view CT may not be useful for existing multi-detector CT (MDCT) due to the fast and continuous acquisition of projection views, there are many new applications of sparse-view CT such as spectral CT using alternating kVp switching, dynamic beam blocker, and the like. Moreover, in C-arm CT or dental CT applications, the scan time is limited primarily by the relative slow speed of the plat-panel detector, rather than the mechanical gantry speed, so sparse-view CT gives an opportunity to reduce the scan time.
  • However, insufficient projection views in sparse-view CT produces severe streaking artifacts in filtered-backprojection (FBP) reconstruction. To address this, conventional technologies have investigated compressed sensing approaches that minimize the total variation (TV) or other sparsity-inducing penalties under a data fidelity term. These approaches are, however, computationally expensive due to the repeated applications of projection and back-projection during iterative update steps.
  • Recently, deep learning approaches have achieved tremendous success in various fields, such as classification, segmentation, denoising, super resolution. In CT applications, the previous approach provided the systematic study of deep convolutional neural network (CNN) for low-dose CT and showed that a deep CNN using directional wavelets is more efficient in removing low-dose related CT noises. This work was followed by many novel extensions for low-dose CT. Unlike these low-dose artifacts from reduced tube currents, the streaking artifacts originated from sparse projection views show globalized artifacts that are difficult to remove using conventional denoising CNNs. To address this problem, previous technologies proposed residual learning networks using U-Net. Because the streaking artifacts are globally distributed, CNN architecture with large receptive field was shown essential in these works, and their performance was significantly better than the existing approaches.
  • SUMMARY
  • Embodiments of the inventive concept provide a method for reconstructing a sparse-view CT image as a high-quality image using a neural network for a learning model satisfying a predetermined frame condition and an apparatus therefor.
  • According to an exemplary embodiment, an image processing method may include receiving a sparse-view computed tomography (CT) data and reconstructing an image for the sparse-view CT data using a neural network of a learning model satisfying a predetermined frame condition.
  • The reconstructing of the image may include reconstructing the image for the sparse-view CT data using the neural network of the learning model which satisfies the frame condition and is learned by residual learning.
  • The neural network may include a neural network which generates the learning model satisfying the frame condition through a mathematical analysis based on convolutional framelets and is learned by the learning model.
  • The neural network may include a multi-resolution neural network including pooling and unpooling layers.
  • The neural network may include a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing a mathematical expression of the multi-resolution neural network as a dual frame.
  • The neural network may include a by-pass connection from the pooling layer to the unpooling layer.
  • According to an exemplary embodiment, an image processing method may include receiving a sparse-view CT data and reconstructing an image for the sparse-view CT data using a neural network for a learning model which satisfies a predetermined frame condition and is based on convolutional framelets.
  • The neural network may include a multi-resolution neural network including pooling and unpooling layers.
  • The neural network may include a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing a mathematical expression of the multi-resolution neural network as a dual frame.
  • According to an exemplary embodiment, an image processing device may include a reception unit configured to receive a sparse-view CT data and a reconstruction unit configured to reconstruct an image for the sparse-view CT data using a neural network of a learning model satisfying a predetermined frame condition.
  • The reconstruction unit may be configured to reconstruct the image for the sparse-view CT data using the neural network of the learning model which satisfies the frame condition and is learned by residual learning.
  • The neural network may include a neural network which generates the learning model satisfying the frame condition through a mathematical analysis based on convolutional framelets and is learned by the learning model.
  • The neural network may include a multi-resolution neural network including pooling and unpooling layers.
  • The neural network may include a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing a mathematical expression of the multi-resolution neural network as a dual frame.
  • The neural network may include a by-pass connection from the pooling layer to the unpooling layer.
  • BRIEF DESCRIPTION OF THE FIGURES
  • The above and other objects and features will become apparent from the following description with reference to the following figures, wherein like reference numerals refer to like parts throughout the various figures unless otherwise specified, and wherein:
  • (a) and (b) of FIG. 1 are drawings illustrating an example of CT streaking artifact patterns in the reconstruction images from 48 projection views;
  • (a) and (b) of FIG. 2 are drawings illustrating an example of comparing sizes of receptive fields according to a structure of a network or a neural network;
  • FIG. 3 is an operational flowchart illustrating an image processing method according to an embodiment of the inventive concept;
  • (a), (b), and (c) of FIG. 4 are drawings illustrating a simplified U-Net architecture, a dual frame U-Net architecture, and a tight frame U-Net architecture;
  • FIGS. 5A, 5B, and 5C are drawings illustrating a standard U-Net architecture, a dual frame U-Net architecture, and a tight frame U-Net architecture;
  • (a), (b), and (c) of FIG. 6 are drawings illustrating an example of reconstruction results by general, dual frame, and tight frame U-Nets at various sparse view reconstruction; and
  • FIG. 7 is a block diagram illustrating a configuration of an image processing device according to an embodiment of the inventive concept.
  • DETAILED DESCRIPTION
  • Advantages, features, and methods of accomplishing the same will become apparent with reference to embodiments described in detail below together with the accompanying drawings. However, the inventive concept is not limited by embodiments disclosed hereinafter, and may be implemented in various forms. Rather, these embodiments are provided to so that this disclosure will be through and complete and will fully convey the concept of the invention to those skilled in the art, and the inventive concept will only be defined by the appended claims.
  • Terms used in the specification are used to describe embodiments of the inventive concept and are not intended to limit the scope of the inventive concept. In the specification, the terms of a singular form may include plural forms unless otherwise specified. The expressions “comprise” and/or “comprising” used herein indicate existence of one or more other components, steps, operations, and/or elements other than stated, components, steps, operations, and/or elements but do not exclude presence of additional elements.
  • Unless otherwise defined herein, all terms (including technical and scientific terms) used in the specification may have the same meaning that is generally understood by a person skilled in the art. Also, terms which are defined in a dictionary and commonly used should be interpreted as not in an idealized or overly formal detect unless expressly so defined.
  • Hereinafter, a description will be given in detail of exemplary embodiments of the inventive concept with reference to the accompanying drawings. Like reference numerals are used for the same components shown in each drawing, and a duplicated description of the same components will be omitted.
  • The convolutional framelet may be represented for the input signal f using the local basis ψj and the non-local basis ϕi and may be represented as Equation 1 below.
  • f = 1 d i = 1 n j = 1 q f , φ i ψ j φ ~ i ψ ~ j [ Equation 1 ]
  • Herein, ϕi denotes the linear transform operator with the non-local basis vector, and ψj denotes the linear transform operator with the local basis vector.
  • In this case, the local basis vector and the non-local basis vector may have the dual basis vectors {tilde over (ϕ)}i and {tilde over (ψ)}j, respectively, which are orthogonal to each other. The orthogonal relationship between the basis vectors may be defined as Equation 2 below.
  • Φ ~ Φ = i = 1 m φ ~ i φ i = I n × n , Ψ Ψ ~ = j = 1 q ψ j ψ ~ j = I d × d [ Equation 2 ]
  • Using Equation 2 above, the convolutional framelet may be represented as Equation 3 below.

  • Figure US20190371018A1-20191205-P00001
    d(f)={tilde over (Φ)}ΦT
    Figure US20190371018A1-20191205-P00001
    d(d)Ψ{tilde over (Ψ)}T ={tilde over (Φ)}C{tilde over (Ψ)} T

  • C=Φ T
    Figure US20190371018A1-20191205-P00001
    d(f)Ψ=ΦT(f
    Figure US20190371018A1-20191205-P00002
    Ψ)  [Equation 3]
  • Herein,
    Figure US20190371018A1-20191205-P00001
    d denotes the Hankel matrix operator, which may allow the convolutional operation to be represented as the matrix multiplication, and C denotes the convolutional framelet coefficient which is the signal transformed by the local basis and the non-local basis.
  • The convolutional framelet coefficient C may be reconstructed as the original signal by applying the dual basis vectors {tilde over (ϕ)}i, {tilde over (ψ)}j. The reconstruction process may be represented as Equation 4 below.

  • f=({tilde over (Φ)}c)
    Figure US20190371018A1-20191205-P00002
    τ({tilde over (Ψ)})  [Equation 4]
  • As such, the technique of representing the input signal through the local basis and the non-local basis may be the convolutional framelet.
  • One of the key ingredients for the deep convolutional framelets may be the frame condition for the non-local basis. However, the existing neural network architecture, for example, the U-Net architecture, does not satisfy the frame condition and it overly emphasizes the low frequency component of the signal. In sparse-view CT, this artifact may be manifested as blurring artifacts in the reconstructed images.
  • An embodiment of the inventive concept may provide two types of novel network architectures that satisfy the frame condition and may provide a dual frame network and a tight frame network.
  • Herein, the dual frame network may be a by-pass connection in the low-resolution path to generate a residual signal. The tight frame network with orthogonal wavelet basis, for example, Haar wavelet basis may be implemented by adding the high frequency path to the existing U-Net structure.
  • Mathematical Preliminaries
  • Notations
  • For a matrix A, R(A) denotes the range space of A, and PR(A) denotes the projection to the range space of A. The identity matrix is referred to as I. For a given matrix A, the notation At refers to the generalized inverse matrix. The superscript T of AT denotes the Hermitian transpose. When a matrix Ψ∈
    Figure US20190371018A1-20191205-P00003
    pd×q is partitioned as Ψ=[Ψ1 T . . . Ψp T]T with a submatrix Ψi
    Figure US20190371018A1-20191205-P00003
    d×q, then ψj i refers to the j-th column of Ψi. A vector υ
    Figure US20190371018A1-20191205-P00003
    n refers to the flipped version of a vector υ∈
    Figure US20190371018A1-20191205-P00003
    n, i.e. its indices are reversed. Similarly, for a given matrix Ψ∈
    Figure US20190371018A1-20191205-P00003
    d×q, the notation Ψ
    Figure US20190371018A1-20191205-P00003
    d×q refers to flipped vectors, i.e., Ψ=[ψ 1 . . . ψ q]. For a block structured matrix Ψ∈
    Figure US20190371018A1-20191205-P00003
    pd×q, with a slight abuse of notation, an embodiment of the inventive concept may define Ψ as Equation 5 below.
  • Ψ _ = [ Ψ _ 1 Ψ _ p ] , where Ψ _ i = ψ 1 i _ ψ q i _ ] d × q [ Equation 5 ]
  • Frame
  • A family of functions {ϕk}k∈Γ in a Hilbert space H is called a frame when it satisfies the following inequality of Equation 6 below.
  • α f 2 k Γ f , φ k 2 β f 2 , f H [ Equation 6 ]
  • Here, α, β>0 are called the frame bounds. When α=β, then the frame is said to be tight.
  • A frame is associated with a frame operator Φ composed of ϕk: Φ=[ . . . ϕk−1 ϕk . . . ]. Then, Equation 6 above may be equivalently written by Equation 7 below.

  • α∥f∥ 2≤ΦT f∥ 2 ≤β∥f∥ 2 , ∀f∈H  [Equation 7]
  • The frame bounds may be represented by Equation 8 below.

  • α=σmin(ΦΦT), β=σmax(ΦΦT)  [Equation 8]
  • Here, σmin(A) and σmax(A) denote the minimum and maximum singular values of A, respectively.
  • When the frame lower bound α is non-zero, because {circumflex over (f)}={tilde over (Φ)}c={tilde over (Φ)}ΦTf=f, then the recovery of the original signal may be done from the frame coefficient c=ΦTf using the dual frame {tilde over (Φ)} satisfying the frame condition represented in Equation 9 below.

  • {tilde over (Φ)}ΦT =I  [Equation 9]
  • The explicit form of the dual frame may be given by the pseudo-inverse as Equation 10 below.

  • {tilde over (Φ)}=(ΦΦT)−1Φ  [Equation 10]
  • If the frame coefficients are contaminated by the noise w, i.e., c=ΦTf+w, then the recovered signal using the dual frame is given by {circumflex over (f)}={tilde over (Φ)}c={tilde over (Φ)}(ΦTf+w)=f+{tilde over (Φ)}w. Therefore, the noise amplification factor may be computed by Equation 11 below.
  • Φ ~ w 2 w 2 = σ max ( ΦΦ ) σ m i n ( ΦΦ ) = β α = κ ( ΦΦ T ) [ Equation 11 ]
  • Here, κ(⋅) refers to the condition number.
  • A tight frame has the minimum noise amplification factor β/α=1, and it is equivalent to the condition as Equation 12 below.

  • ΦT Φ=cI, c>0  [Equation 12]
  • Hankel Matrix
  • Since the Hankel matrix is an essential component in the theory (K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” arXiv preprint arXiv:1608.03981, 2016) of deep convolutional framelets, an embodiment of the inventive concept briefly reviews it.
  • To avoid special treatment of boundary condition, an embodiment of the inventive concept may be mainly derived using the circular convolution. For simplicity, an embodiment of the inventive concept may consider 1-D signal processing, but the extension to 2-D signal processing may be straightforward.
  • Let f=[f[1], . . . , f[n]]T
    Figure US20190371018A1-20191205-P00004
    n be the signal vector, a wrap-around Hankel matrix
    Figure US20190371018A1-20191205-P00005
    d(f) may be defined by Equation 13 below.
  • d ( f ) = [ f [ 1 ] f [ 2 ] f [ d ] f [ 2 ] f [ 3 ] f [ d + 1 ] f [ n ] f [ 1 ] f [ d - 1 ] ] [ Equation 13 ]
  • Here, d denotes the matrix pencil parameter.
  • When a multi-channel signal is given as Equation 14 below, an extended Hankel matrix may be constructed by stacking Hankel matrices. The extended Hankel matrix may be represented as Equation 15 below.

  • F:=[f 1 . . . f p]∈
    Figure US20190371018A1-20191205-P00004
    n×p  [Equation 14]

  • Figure US20190371018A1-20191205-P00005
    d|p(F):=[
    Figure US20190371018A1-20191205-P00005
    d(f 1)
    Figure US20190371018A1-20191205-P00005
    d(f 2) . . .
    Figure US20190371018A1-20191205-P00005
    d(f p)]  [Equation 15]
  • Here, the Hankel matrix is closely related to the convolution operations in CNN. Specifically, for a given convolutional filter ψ=[ψ[d], . . . , ψ[1]]T
    Figure US20190371018A1-20191205-P00004
    d, a single-input single-output (SISO) convolution in CNN may be represented as Equation 16 below using a Hankel matrix.

  • y=f
    Figure US20190371018A1-20191205-P00002
    ψ=
    Figure US20190371018A1-20191205-P00005
    d(f)ψ∈
    Figure US20190371018A1-20191205-P00004
    n  [Equation 16]
  • Similarly, a single-input multi-output (SIMO) convolution using CNN filter kernel Ψ=[ψ1 . . . , ψq]∈
    Figure US20190371018A1-20191205-P00004
    d×q may be represented by Equation 17 below.

  • Y=f
    Figure US20190371018A1-20191205-P00002
    ψ=
    Figure US20190371018A1-20191205-P00001
    d(f)Ψ∈
    Figure US20190371018A1-20191205-P00006
    n×q  [Equation 17]
  • Here, q denotes the number of output channels.
  • A multi-input multi-output (MIMO) convolution in CNN may be represented by Equation 18 below.
  • Y = F Ψ = d p ( F ) [ Ψ 1 Ψ p ] [ Equation 18 ]
  • Here, p and q refer to the number of input and output channels, respectively.
  • The j-th input channel filter may be represented as Equation 19 below.

  • Ψj=[ψ1 j . . . ψq j]∈
    Figure US20190371018A1-20191205-P00006
    d×q  [Equation 19]
  • The extension to the multi-channel 2-D convolution operation for an image domain CNN may be straightforward, since similar matrix vector operations may also be used. That is, the extension to the multi-channel 2-D convolution operation for the image domain CNN may be changed in only the definition of the Hankel matrices, which is defined as block Hankel matrix, or the extended Hankel matrices.
  • One of the most intriguing properties of the Hankel matrix is that it often has a low-rank structure and its low-rankness is related to the sparsity in the Fourier domain. This property is extremely useful, as evidenced by their applications for many inverse problems and low-level computer vision problems.
  • Deep Convolutional Framelets
  • An embodiment of the inventive concept may briefly review the theory of deep convolutional framelets. Using the existing Hankel matrix approaches, an embodiment of the inventive concept may consider the following regression problem as Equation 20 below.
  • min f n f * - f 2 subject to RANK d ( f ) r < d [ Equation 20 ]
  • Here, f*∈
    Figure US20190371018A1-20191205-P00007
    d denotes the ground-truth signal.
  • The classical approach to address the problem of Equation 20 above is to use singular value shrinkage or matrix factorization. However, in deep convolutional framelets, the problem is addressed using learning-based signal representation.
  • More specifically, for any feasible solution f for Equation 20 above, its Hankel structured matrix
    Figure US20190371018A1-20191205-P00001
    d(f) has the singular value decomposition
    Figure US20190371018A1-20191205-P00001
    d(f)=UΣVT, where U=[u1 . . . ur]∈
    Figure US20190371018A1-20191205-P00007
    n×r and V=[υ1 . . . υr]∈
    Figure US20190371018A1-20191205-P00007
    d×r denote the left and right singular vector basis matrices, respectively. Σ=(σij)∈
    Figure US20190371018A1-20191205-P00007
    r×r is the diagonal matrix with singular values. An embodiment of the inventive concept may consider the matrix pairs Φ, {tilde over (Φ)}∈
    Figure US20190371018A1-20191205-P00007
    n×n satisfying the frame condition represented in Equation 21 below.

  • {tilde over (Φ)}ΦT =I  [Equation 21]
  • These bases are referred to as non-local bases since they interact with all the n-elements of f∈
    Figure US20190371018A1-20191205-P00007
    n by multiplying them to the left of
    Figure US20190371018A1-20191205-P00001
    d(f)∈
    Figure US20190371018A1-20191205-P00007
    n×d. An embodiment of the inventive concept may need another matrix pair Ψ, {tilde over (Ψ)}∈
    Figure US20190371018A1-20191205-P00006
    d×r satisfying the low dimensional subspace constraint represented in Equation 22 below.

  • Φ{tilde over (Φ)}T =P R(V)  [Equation 22]
  • These may be called local bases because they only interact with d-neighborhood of the signal f∈
    Figure US20190371018A1-20191205-P00006
    n. Using Equations 21 and 22 above, an embodiment of the inventive concept may obtain the Hankel matrix reconstruction operator for the input signal f as Equation 23 below.

  • Figure US20190371018A1-20191205-P00001
    d(f)={tilde over (Φ)}ΦT
    Figure US20190371018A1-20191205-P00001
    d(f)Ψ{tilde over (Ψ)}T  [Equation 23]
  • Here,
    Figure US20190371018A1-20191205-P00001
    d denotes the Hankel matrix reconstruction operator.
  • Factorizing ΦT
    Figure US20190371018A1-20191205-P00001
    d(f)Ψ from Equation 23 above results in the decomposition of f using a single layer encoder-decoder architecture as Equation 24 below.

  • f=({tilde over (Φ)}C)
    Figure US20190371018A1-20191205-P00002
    ν({tilde over (Ψ)}), C=Φ T(f
    Figure US20190371018A1-20191205-P00002
    Ψ)  [Equation 24]
  • Here, the encoder and decoder convolution filters are given by Equation 25 below.
  • Ψ _ := [ ψ _ 1 ψ _ q ] d × q , v ( Ψ ~ ) := 1 d [ ψ ~ 1 ψ ~ q ] dq [ Equation 25 ]
  • Equation 24 above is the general form of the signals that are associated with a rank-r Hankel structured matrix, and an embodiment of the inventive concept is interested in specifying bases for optimal performance. In the deep convolutional framelets, Φ and {tilde over (Φ)} may correspond to the user-defined generalized pooling and unpooling to satisfy the frame condition of Equation 21 above. On the other hand, the filters Ψ, {tilde over (Ψ)} need to be estimated from the data. To limit the search space for the filters, an embodiment of the inventive concept may consider
    Figure US20190371018A1-20191205-P00008
    0, which consists of signals that have positive framelet coefficients.
    Figure US20190371018A1-20191205-P00008
    0 that consists of the signals that have the positive framelet coefficients may be represented as Equation 16 below.

  • Figure US20190371018A1-20191205-P00008
    0 ={f∈
    Figure US20190371018A1-20191205-P00006
    n |f=({tilde over (Φ)}C)
    Figure US20190371018A1-20191205-P00002
    ν({tilde over (Ψ)})C=Φ T(f
    Figure US20190371018A1-20191205-P00002
    Ψ)≥0}  [Equation 26]
  • The main goal of the neural network training is to learn (Ψ, {tilde over (Ψ)}) from training data {(f(i), f*(i))}i=1 N assuming that {f*(i)} is associated with rank-r Hankel matrices. More specifically, the regression problem in an embodiment of the inventive concept for the training data under rank-r Hankel matrix constraint in Equation 20 above may be given by Equation 27 below.
  • min { f ( i ) } 0 i = 1 N f ( i ) * - f ( i ) 2 [ Equation 27 ]
  • Equation 27 above may be represented as Equation 28 below.
  • min { Ψ , Ψ ~ } i = 1 N f ( i ) * - Q ( f ( i ) ; Ψ , Ψ ~ ) 2 [ Equation 28 ]
  • Here, Q may be represented as Equation 29 below.

  • Figure US20190371018A1-20191205-P00009
    (f (i);Ψ,{tilde over (Ψ)})=({tilde over (Φ)}C[f (i)])
    Figure US20190371018A1-20191205-P00002
    ν({tilde over (Ψ)})  [Equation 29]
  • Here, C may be represented as Equation 30 below.

  • C[f (i)]=ρ(ΦT(f (i)
    Figure US20190371018A1-20191205-P00002
    Ψ))  [Equation 30]
  • Here, ρ(⋅) refers to the rectified linear unit (ReLU) to impose the positivity for the framelet coefficients.
  • After the network is fully trained, the inference for a given noisy input f is simply done by
    Figure US20190371018A1-20191205-P00009
    (f; Ψ, {tilde over (Ψ)}), which is equivalent to find a denoised solution that has the rank-r Hankel structured matrix.
  • In the sparse-view CT problems, it was consistently shown that the residual learning with a by-pass connection is better than direct image learning. To investigate this phenomenon systematically, assuming that the input image f(i) of sparse-view CT is contaminated with streaking artifacts, the input image f(i) of the sparse-view CT may be represented as Equation 31 below.

  • f (i) =f* (i) +h (i)  [Equation 31]
  • Here, h(i) denotes the streaking artifacts and f*(i) refers to the artifact-free ground-truth image.
  • Then, instead of using the cost function, the residual network training may be formulated as Equation 32 below.
  • min { Ψ , Ψ ~ } i = 1 N h ( i ) - Q ( f ( i ) * + h ( i ) ; Ψ , Ψ ~ ) 2 [ Equation 32 ]
  • Here, the residual learning scheme is to find the filter Ψ which approximately annihilates the true signal f*(i) as Equation 33 below.

  • f* (i)
    Figure US20190371018A1-20191205-P00002
    Ψ0  [Equation 33]
  • The signal decomposition using deep convolutional framelets may be applied for the streaking artifact signal as Equation 34 below.
  • ( Φ ~ C [ f ( i ) * + h ( i ) ] v ( Ψ ~ ) ( Φ ~ C [ h ( i ) ] ) v ( Ψ _ ) = h ( i ) [ Equation 34 ]
  • Here, the first approximation may come from Equation 35 below thanks to the annihilating property of Equation 29 above.
  • C [ f ( i ) * + h ( i ) ] = Φ ( ( f ( i ) * + h ( i ) ) Ψ _ ) C [ h ( i ) ] [ Equation 35 ]
  • Accordingly, the neural network is trained to learn the structure of the true image to annihilate the artifact signals, but still to retain the artifact signals.
  • The above-mentioned details may be extended to the multi-layer deep convolutional framelet. More specifically, for the L-layer decomposition, the space
    Figure US20190371018A1-20191205-P00010
    0 in Equation 26 above may be recursively defined as Equation 36 below.
  • 0 = { f n f = ( Φ ~ C ) v ( Ψ ~ ) , C = Φ ( f Ψ _ ) 0 , C 1 } [ Equation 36 ]
  • Here,
    Figure US20190371018A1-20191205-P00010
    l, l=1, . . . , L−1 may be defined as Equation 37 below.

  • Figure US20190371018A1-20191205-P00010
    l ={Z∈
    Figure US20190371018A1-20191205-P00006
    n×p (l) |Z=({tilde over (Φ)}C (l))
    Figure US20190371018A1-20191205-P00002
    ν({tilde over (Ψ)}(l)),C (l)T(Z
    Figure US20190371018A1-20191205-P00002
    Ψ (l))≥0,C (l)
    Figure US20190371018A1-20191205-P00010
    l+1}

  • Figure US20190371018A1-20191205-P00010
    L=
    Figure US20190371018A1-20191205-P00006
    n×p (L)   [Equation 37]
  • In Equation 37 above, the l-th layer encoder and decoder filters may be defined by Equations 38 and 39 below.
  • Ψ _ ( l ) := [ ψ _ 1 1 ψ _ q 1 ψ _ 1 p ( l ) ψ _ q ( l ) p ( l ) ] d ( l ) p ( l ) × q ( l ) [ Equation 38 ] v ( Ψ ~ ( l ) ) := 1 d [ ψ ~ 1 1 ψ ~ 1 p ( l ) ψ ~ q ( l ) 1 ψ ~ q ( l ) p ( l ) ] d ( l ) q ( l ) × p ( l ) [ Equation 39 ]
  • Here, d(l), p(l), q(l) denote the filter length, and the number of input and output channels, respectively.
  • As described above, by recursively narrowing the search space of the convolution frames in each layer, an embodiment of the inventive concept may obtain the deep convolution framelet extension and the associated training scheme.
  • In short, the non-local bases ΦT and {tilde over (Φ)} correspond to the generalized pooling and unpooling operations, while the local bases Ψ and {tilde over (Ψ)} work as learnable convolutional filters. Moreover, for the generalized pooling operation, the frame condition may be the most important prerequisite for enabling the recovery condition and controllable shrinkage behavior, which is the main criterion for constructing the U-Net variants.
  • FIG. 3 is an operational flowchart illustrating an image processing method according to an embodiment of the inventive concept.
  • Referring to FIG. 3, the image processing method according to an embodiment of the inventive concept may include receiving (S310) sparse-view CT data and reconstructing (S320) an image for the sparse-view CT data using a neural network of a learning model satisfying a predetermined frame condition.
  • Herein, operation S320 may be to reconstruct the image the sparse-view CT data using the neural network of the learning model which satisfies the frame condition and is learned by residual learning.
  • The neural network used in an embodiment of the inventive concept may include a neural network which generates a learning model which satisfies the frame condition through a mathematical analysis based on the convolutional framelets and is learned by the learning model and may include a multi-resolution neural network including pooling and unpooling layers.
  • Herein, the neural network may include a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing the mathematical expression of the multi-resolution neural network as the dual frame.
  • In addition, the neural network may include a by-pass connection from the pooling layer to the unpooling layer.
  • A description will be given of the method according to an embodiment of the inventive concept with reference to FIGS. 3 to 6.
  • U-Net for Sparse-View CT and its Limitations
  • (a) and (b) of FIG. 1 are drawings illustrating an example of CT streaking artifact patterns in the reconstruction images from 48 projection views. (a) and (b) of FIG. 1 show two reconstruction images and their artifact-only images when only 48 projection views are available.
  • As shown in (a) and (b) of FIG. 1, there is a significant streaking artifact that emanates from images over the entire image area. This suggests that the receptive field of the convolution filter should cover the entire area of the image to effectively suppress the streaking artifacts.
  • One of the most important characteristics of multi-resolution architecture like U-Net (O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 234-241.) is the exponentially large receptive field due to the pooling and unpooling layers.
  • (a) and (b) of FIG. 2 are drawings illustrating an example of comparing sizes of receptive fields according to a structure of a network or a neural network. (a) and (b) of FIG. 2 compare the size of the receptive field of a multi-resolution network U-Net ((b) of FIG. 2) and a single resolution CNN ((a) of FIG. 2) without pooling layers. As shown in (a) of FIG. 2, for the general neural network, the receptive field becomes larger and larger as it passes through the convolution layer Cony. However, because the rate of increase is not high, it should have the very deep neural network architecture to include the entire area. However, when the depth of the neural network becomes deeper and deeper, it may face the problem of gradient vanishing which is poor in learning the neural network or overfitting may occur due to a large number of parameters. On the other hand, as shown in (b) of FIG. 2, for U-Net, because the input image decreases in size as the receptive field passes through pooling and unpooling layers, the reception field for the input image may become relatively larger. Due to this, compared with the general neural network of the same depth, an embodiment of the inventive concept may have the larger receptive field. Thus, the neural network architecture that has a large receptive field on the neural network may be mainly used as the multi-resolution architecture, for example, the U-Net architecture, when performing the segmentation in the image. However, from the theoretical viewpoint, the limit clearly exists in the image reconstruction using the U-Net architecture. Furthermore, as may be observed in (a) and (b) of FIG. 2, with the same size convolutional filters, the receptive field is enlarged in the network with pooling layers. Thus, the multi-resolution architecture, such as U-Net, is good for the sparse view CT reconstruction to deal with the globally distributed streaking artifacts.
  • (a) of FIG. 4 shows a simplified U-Net architecture. As shown in (a) of FIG. 4, U-Net play a role in delivering the signal of the input unit to the output unit, by using the average pooling layer and the average unpooling layer as the non-local bases and through the by-pass connection layer expressed by a dotted line.
  • The U-Net is recursively applied to the low-resolution signal. Here, the input f∈
    Figure US20190371018A1-20191205-P00006
    n is first filtered with the local convolutional filter Ψ, which is then reduced to a half size approximate signal using a pooling operation Φ. Mathematically, this step may be represented by Equation 40 below.

  • C=Φ T(f
    Figure US20190371018A1-20191205-P00002
    Ψ)=ΦT
    Figure US20190371018A1-20191205-P00001
    d(f)Ψ  [Equation 40]
  • Here, f
    Figure US20190371018A1-20191205-P00002
    Ψ denotes the multi-channel convolution in CNN. For the case of average pooling, ΦT denotes a pooling operator, and the pooling operator is given by Equation 41 below.
  • Φ = 1 2 [ 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 ] n 2 × n [ Equation 41 ]
  • As shown in (a) of FIG. 4, the U-Net has the by-pass connection to compensate for the lost high frequency during pooling. Combining the two, the convolutional framelet coefficients may be represented by Equation 42 below.
  • C ext = Φ ext ( f Ψ ) = [ B S ] [ Equation 42 ]
  • Here, Φext T refers to the extended pooling, B refers to the bypass component, and S refers to the low pass subband.
  • Φext T may be given by Equation 43 below, and B and S may be represented as Equation 44 below.
  • Φ ext := [ I Φ ] [ Equation 43 ] B = f Ψ _ , S = Φ ( f Ψ _ ) [ Equation 44 ]
  • Thus, Equation 45 below may be derived using the above-mentioned equations.

  • ΦextΦext T =I+ΦΦ T  [Equation 45]
  • Here, ΦΦT=PR(Φ) for the case of average pooling.
  • Thus, Φext does not satisfy the frame condition, which results in artifacts.
  • In other words, because the signal reconstructed through the neural network of the general U-Net architecture does not express the original signal perfectly and leads to an overemphasis of the low frequency components, the blurred reconstruction signal may be generated.
  • Dual Frame U-Net
  • As described above, one simple fix for the aforementioned limitation is using the dual frame. Specifically, using Equation 10 above, the dual frame for Φext in Equation 43 above may be obtained as Equation 46 below.

  • {tilde over (Φ)}ext=(ΦextΦext T)−1Φext=(I+ΦΦ T)−1[IΦ]  [Equation 46]
  • Here, thanks to the matrix inversion lemma and the orthogonality ΦTΦ=I for the case of average pooling, an embodiment of the inventive concept may obtain Equation 47 below.

  • (I+ΦΦ T)−1 =I−Φ(I+Φ TΦ)−1ΦT =I−½ΦΦT  [Equation 47]
  • Thus, the dual frame may be given by Equation 48 below.

  • {tilde over (Φ)}ext=(I−ΦΦ T/2)[IΦ]=[I−ΦΦ T/2Φ/2]  [Equation 48]
  • For a given framelet coefficient Cext in Equation 42 above, the reconstruction using the dual frame may be given by Equation 49 below.
  • C ^ ext := Φ ~ ext C ext = ( I - ΦΦ 2 ) B + 1 2 Φ S = B + 1 2 Φ unpooling ( S - Φ B ) residual [ Equation 49 ]
  • Constructing the neural network based on Equation 49 above may suggest a network structure for the dual frame U-Net. More specifically, unlike the U-Net, the residual signal at the low resolution may be upsampled through the unpooling layer. This may be easily implemented using an additional by-pass connection for the low-resolution signal as shown in (b) of FIG. 4. This simple fix allows the network to satisfy the frame condition. However, there exists noise amplification like I+ΦΦT=I+PR(Φ).
  • Similar to the U-Net, the final step of dual frame U-Net is the concatenation and the multi-channel convolution, which is equivalent to applying the inverse Hankel operation, i.e.,
    Figure US20190371018A1-20191205-P00001
    d t(⋅), to the processed framelet coefficients multiplied with the local basis. Specifically, the concatenated signal may be given by Equation 50 above. The final convolution may be equivalently computed by Equation 50 below.
  • W = [ B 1 2 Φ ( S - Φ B ) ] [ Equation 50 ] f ^ = d ( W [ Ξ ] ) = d ( B Ξ ) + 1 2 d ( Φ S ) - 1 2 d ( ΦΦ B ) = d ( d ( f ) Ψ Ξ ) = 1 d i = 1 q ( f ψ _ i ξ i ) [ Equation 51 ]
  • Here, the third equation in Equation 51 above comes from S=ΦT(F
    Figure US20190371018A1-20191205-P00002
    Ψ)=ΦTB. Therefore, by choosing the local filter basis such that ΨΞ=I, the right hand side of Equation 51 above becomes equal to f, satisfying the recovery condition.
  • Tight Frame U-Net
  • Another way to improve the performance of U-Net with minimum noise amplification is using tight filter-bank frames or wavelets. Specifically, the non-local basis ΦT may be composed of filter bank as represented in Equation 52 below.

  • Φ=[T 1 . . . T L]  [Equation 52]
  • Here, Tk denotes the k-th subband operator.
  • An embodiment of the inventive concept assumes that the filter bank is tight like Equation 53 below, i.e., for some scalar c>0.
  • ΦΦ = k = 1 L T k T k = c I [ Equation 53 ]
  • The convolutional framelet coefficients including a by-pass connection may be written by Equation 54 below.

  • C ext:=Φext T(f
    Figure US20190371018A1-20191205-P00002
    Ψ)=[B T S 1 T . . . S L T]T  [Equation 54]
  • Here, Φext:=[I T1 . . . TL]T, B=f
    Figure US20190371018A1-20191205-P00002
    Ψ, Sk=Tk TC.
  • An embodiment of the inventive concept may see that Φext is also a tight frame by Equation 55 below.
  • Φ ext Φ ext = I + k = 1 L T k T k = ( c + 1 ) I [ Equation 55 ]
  • There are several important tight filter bank frames. One of the simplest tight filter bank frames is that Haar wavelet transform with low-pass subband decomposition and high-pass subband decomposition, where T1 is the low-pass subband, which is equivalent to the average pooling in Equation 37 above and T2 is the high-pass subband filter. The high-pass filtering T2 may be given by Equation 56 below.
  • T 2 = 1 2 [ 1 - 1 0 0 0 0 0 0 1 - 1 0 0 0 0 0 1 - 1 ] [ Equation 56 ]
  • An embodiment of the inventive concept may see that T1T1 T+T2T2 T=I; so the Haar wavelet frame is tight. In the corresponding tight frame U-Net structure, as illustrated in (c) of FIG. 4, in contrast to the U-Net structure in (a) of FIG. 4, there is an additional high-pass branch. As shown in (c) of FIG. 4, similar to the U-Net in (a) of FIG. 4, in the tight frame U-Net, each subband signal is by-passed to the individual concatenation layers. The convolutional layer after the concatenation layers may provide weighted sum of which weights are learned from data. This simple fix makes the frame tight.
  • In other words, the neural network of the tight frame U-Net structure shown in (c) of FIG. 4 may be expressed as the neural network which has the tight filter-bank or the wavelet as the non-local basis to satisfy the convolutional frame. Herein, the non-local basis of the tight frame U-Net may satisfy the tight frame.
  • The nonlinear operation may restrict the sparsity of the signal for various input signals f or may restrict the positivity of the signal. This may enable the neural network to learn various input signals or the transform signal. This may enable local and non-local basis vectors of the linear transform operation to find various solutions. Moreover, the nonlinear operation may construct the nonlinear operation in the form of satisfying the reconstruction condition.
  • The residual learning is applicable to the neural network to enhance the learning effect. The residual learning may make the local basis of the linear transform lower rank, so the unnecessary load of the neural network may be greatly reduced. Such an internal by-pass connection or an external by-pass connection may overcome the difficulty of the deep network training to improve the performance of removing the local noise and the non-local noise.
  • FIGS. 5A to 5C are drawings illustrating a standard U-Net architecture (5A), a dual frame U-Net architecture (5B), and a tight frame U-Net architecture (5C).
  • As shown in FIGS. 5A to 5C, each network may include a convolution layer for performing the linear transform operation, a batch normalization layer for performing the normalization operation, a rectified linear unit (ReLU) layer for performing the nonlinear function operation, and a path connection with concatenation. Specifically, each stage may include four sequential layers composed of convolution with 3×3 kernels, batch normalization, and ReLU layers. The last stage may include two sequential layers and the last layer. The last layer may include only the convolution layer with 1×1 kernel. The number of channels for each convolution layer is illustrated in FIGS. 5A to 5C. The number of channels may be doubled after each pooling layers. The differences between the U-Net and the dual frame U-Net or the tight frame U-Net are from the pooling and unpooling layers.
  • As shown in FIG. 5A, because the pooling and unpooling layers included in the U-Net structure do not satisfy the frame condition, the standard U-Net may show the limitation of the reconstruction level from the signal point of view. An embodiment of the inventive concept may mathematically prove the limit of the existing U-Net structure and may formulate the theory capable of overcoming the limit based on it, thus providing the dual frame U-Net and the tight frame U-Net which are the neural network architecture satisfying the frame condition.
  • As shown in FIG. 5B, the dual frame U-Net may express the mathematical expression of the U-Net as the dual frame to be the structured neural network architecture and may be the neural network architecture proposed to have the similar amount of computation to the U-Net structure and satisfy the frame condition by adding the residual path concurrently with maintaining the general U-Net structure.
  • As shown in FIG. 5C, the tight frame U-Net may decompose the low-frequency domain and the high-frequency domain using the wavelet. The low-frequency domain may be decomposed stage by stage to be the same as the operation performed in the general U-Net structure, while the high-frequency domain may be reconstructed without losing the high-frequency signal, by designing the path to pass to the opposite layer.
  • (a) to (c) of FIG. 6 are drawings illustrating an example of reconstruction results by general, dual frame, and tight frame U-Nets at various sparse view reconstruction. The left box in each image region illustrates the enlarged images, and the right box illustrates the difference images. The number written to the images is the normalized mean square error (NMSE) value.
  • As shown in the enlarged images and the difference images shown in (a) to (c) of FIG. 6, the U-Net produces blurred edge images in many areas, while the dual frame and tight frame U-Nets may enhance the high frequency characteristics of the images. In other words, the dual frame U-Net and the tight frame U-Net according to an embodiment of the inventive concept may reduce the phenomenon in which the images are crushed, which is the limit of the general U-Net, and may reconstruct all sparse-view images using the single neural network without the correction of additional parameters by simultaneously learning various sparse-view images.
  • FIG. 7 is a block diagram illustrating a configuration of an image processing device according to an embodiment of the inventive concept and illustrates a configuration of a device which performs the methods in FIGS. 3 to 6.
  • Referring to FIG. 7, an image processing device 700 according to an embodiment of the inventive concept may include a reception unit 710 and a reconstruction unit 720.
  • The reception unit 710 may receive sparse-view CT data.
  • The reconstruction unit 720 may reconstruct an image for the sparse-view CT data using a neural network for a learning model which satisfies a predetermined frame condition and is based on the convolutional framelets.
  • Herein, the reconstruction unit 720 may reconstruct the image for the sparse-view CT data using the neural network of the learning model which satisfies the frame condition and is learned by the residual learning.
  • The neural network used in the device according to an embodiment of the inventive concept may include a neural network which generates a learning model satisfying the frame condition through a mathematical analysis based on the convolutional framelets and is learned by the learning model and may include a multi-resolution neural network including pooling and unpooling layers.
  • Herein, the multi-resolution neural network may include a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing the mathematical expression of the multi-resolution neural network as the dual frame.
  • In addition, the neural network may include a by-pass connection from the pooling layer to the unpooling layer.
  • It is apparent to those skilled in the art that, although the description is omitted in the image processing device 700 of FIG. 7, the respective components configuring FIG. 7 may include all details described in FIGS. 1 to 6.
  • The foregoing devices may be realized by hardware elements, software elements and/or combinations thereof. For example, the devices and components illustrated in the exemplary embodiments of the inventive concept may be implemented in one or more general-use computers or special-purpose computers, such as a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), a programmable logic unit (PLU), a microprocessor or any device which may execute instructions and respond. A processing unit may implement an operating system (OS) or one or software applications running on the OS. Further, the processing unit may access, store, manipulate, process and generate data in response to execution of software. It will be understood by those skilled in the art that although a single processing unit may be illustrated for convenience of understanding, the processing unit may include a plurality of processing elements and/or a plurality of types of processing elements. For example, the processing unit may include a plurality of processors or one processor and one controller. Also, the processing unit may have a different processing configuration, such as a parallel processor.
  • Software may include computer programs, codes, instructions or one or more combinations thereof and may configure a processing unit to operate in a desired manner or may independently or collectively control the processing unit. Software and/or data may be permanently or temporarily embodied in any type of machine, components, physical equipment, virtual equipment, computer storage media or units or transmitted signal waves so as to be interpreted by the processing unit or to provide instructions or data to the processing unit. Software may be dispersed throughout computer systems connected via networks and may be stored or executed in a dispersion manner. Software and data may be recorded in one or more computer-readable storage media.
  • The methods according to the above-described exemplary embodiments of the inventive concept may be implemented with program instructions which may be executed through various computer means and may be recorded in computer-readable media. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded in the media may be designed and configured specially for the exemplary embodiments of the inventive concept or be known and available to those skilled in computer software. Computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as compact disc-read only memory (CD-ROM) disks and digital versatile discs (DVDs); magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. Program instructions include both machine codes, such as produced by a compiler, and higher level codes that may be executed by the computer using an interpreter. The described hardware devices may be configured to act as one or more software modules to perform the operations of the above-described exemplary embodiments of the inventive concept, or vice versa.
  • According to embodiments of the inventive concept, the image processing device may reconstruct a sparse-view CT image as a high-quality image using the neural network for the learning model satisfying the predetermined frame condition.
  • According to embodiments of the inventive concept, the image processing device may reconstruct a high-quality image while having the similar amount of calculation to the existing neural network architecture by mathematically proving the limit of the existing multi-resolution neural network, for example, the U-Net structure, formulating the theory capable of overcoming the limit based on it, providing the neural network satisfying the frame condition, and reconstructing the sparse-view CT image by means of the neural network.
  • While a few exemplary embodiments have been shown and described with reference to the accompanying drawings, it will be apparent to those skilled in the art that various modifications and variations can be made from the foregoing descriptions. For example, adequate effects may be achieved even if the foregoing processes and methods are carried out in different order than described above, and/or the aforementioned elements, such as systems, structures, devices, or circuits, are combined or coupled in different forms and modes than as described above or be substituted or switched with other components or equivalents.
  • Therefore, other implements, other embodiments, and equivalents to claims are within the scope of the following claims.

Claims (15)

What is claimed is:
1. An image processing method, comprising:
receiving a sparse-view computed tomography (CT) data; and
reconstructing an image for the sparse-view CT data using a neural network of a learning model satisfying a predetermined frame condition.
2. The image processing method of claim 1, wherein the reconstructing of the image comprises:
reconstructing the image for the sparse-view CT data using the neural network of the learning model which satisfies the frame condition and is learned by residual learning.
3. The image processing method of claim 1, wherein the neural network comprises:
a neural network which generates the learning model satisfying the frame condition through a mathematical analysis based on convolutional framelets and is learned by the learning model.
4. The image processing method of claim 1, wherein the neural network comprises:
a multi-resolution neural network including pooling and unpooling layers.
5. The image processing method of claim 4, wherein the neural network comprises:
a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing a mathematical expression of the multi-resolution neural network as a dual frame.
6. The image processing method of claim 4, wherein the neural network comprises:
a by-pass connection from the pooling layer to the unpooling layer.
7. An image processing method, comprising:
receiving a sparse-view CT data; and
reconstructing an image for the sparse-view CT data using a neural network for a learning model which satisfies a predetermined frame condition and is based on convolutional framelets.
8. The image processing method of claim 7, wherein the neural network comprises:
a multi-resolution neural network including pooling and unpooling layers.
9. The image processing method of claim 8, wherein the neural network comprises:
a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing a mathematical expression of the multi-resolution neural network as a dual frame.
10. An image processing device, comprising:
a reception unit configured to receive a sparse-view CT data; and
a reconstruction unit configured to reconstruct an image for the sparse-view CT data using a neural network of a learning model satisfying a predetermined frame condition.
11. The image processing device of claim 10, wherein the reconstruction unit is configured to:
reconstruct the image for the sparse-view CT data using the neural network of the learning model which satisfies the frame condition and is learned by residual learning.
12. The image processing device of claim 10, wherein the neural network comprises:
a neural network which generates the learning model satisfying the frame condition through a mathematical analysis based on convolutional framelets and is learned by the learning model.
13. The image processing device of claim 10, wherein the neural network comprises:
a multi-resolution neural network including pooling and unpooling layers.
14. The image processing device of claim 13, wherein the neural network comprises:
a structured tight frame neural network by decomposing a structured dual frame neural network and the multi-resolution neural network into a low-frequency domain and a high-frequency domain using wavelets by expressing a mathematical expression of the multi-resolution neural network as a dual frame.
15. The image processing device of claim 13, wherein the neural network comprises:
a by-pass connection from the pooling layer to the unpooling layer.
US16/365,498 2018-05-29 2019-03-26 Method for processing sparse-view computed tomography image using neural network and apparatus therefor Active 2039-05-21 US10991132B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020180060849A KR102094598B1 (en) 2018-05-29 2018-05-29 Method for processing sparse-view computed tomography image using artificial neural network and apparatus therefor
KR10-2018-0060849 2018-05-29

Publications (2)

Publication Number Publication Date
US20190371018A1 true US20190371018A1 (en) 2019-12-05
US10991132B2 US10991132B2 (en) 2021-04-27

Family

ID=68692744

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/365,498 Active 2039-05-21 US10991132B2 (en) 2018-05-29 2019-03-26 Method for processing sparse-view computed tomography image using neural network and apparatus therefor

Country Status (2)

Country Link
US (1) US10991132B2 (en)
KR (1) KR102094598B1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874828A (en) * 2020-01-20 2020-03-10 上海尽星生物科技有限责任公司 Neural network model and ultrasonic beam forming method based on neural network model
CN111028512A (en) * 2019-12-31 2020-04-17 福建工程学院 Real-time traffic prediction method and device based on sparse BP neural network
US20200196973A1 (en) * 2018-12-21 2020-06-25 Canon Medical Systems Corporation Apparatus and method for dual-energy computed tomography (ct) image reconstruction using sparse kvp-switching and deep learning
US20200196972A1 (en) * 2018-12-20 2020-06-25 Canon Medical Systems Corporation Apparatus and method that uses deep learning to correct computed tomography (ct) with sinogram completion of projection data
US10991132B2 (en) * 2018-05-29 2021-04-27 Korea Advanced Institute Of Science And Technology Method for processing sparse-view computed tomography image using neural network and apparatus therefor
WO2021151272A1 (en) * 2020-05-20 2021-08-05 平安科技(深圳)有限公司 Method and apparatus for cell image segmentation, and electronic device and readable storage medium
WO2021159234A1 (en) * 2020-02-10 2021-08-19 深圳先进技术研究院 Image processing method and apparatus, and computer-readable storage medium
WO2021182103A1 (en) * 2020-03-11 2021-09-16 国立大学法人筑波大学 Trained model generation program, image generation program, trained model generation device, image generation device, trained model generation method, and image generation method
US20210383582A1 (en) * 2020-06-08 2021-12-09 GE Precision Healthcare LLC Systems and methods for a stationary ct imaging system
US20220122235A1 (en) * 2020-10-16 2022-04-21 Microsoft Technology Licensing, Llc Dual-Stage System for Computational Photography, and Technique for Training Same
CN114494482A (en) * 2021-12-24 2022-05-13 中国人民解放军总医院第一医学中心 Method for generating CT blood vessel imaging based on flat scanning CT
US20220240879A1 (en) * 2021-02-01 2022-08-04 Medtronic Navigation, Inc. Systems and methods for low-dose ai-based imaging

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110992295B (en) * 2019-12-20 2022-04-19 电子科技大学 Low-dose CT reconstruction method based on wavelet-RED convolution neural network
CN111275083B (en) * 2020-01-15 2021-06-18 浙江工业大学 Optimization method for realizing residual error network characteristic quantity matching
CN112669401B (en) * 2020-12-22 2022-08-19 中北大学 CT image reconstruction method and system based on convolutional neural network
KR102553068B1 (en) * 2021-04-20 2023-07-06 연세대학교 원주산학협력단 Sparse-view ct reconstruction based on multi-level wavelet convolutional neural network
CN113487638A (en) * 2021-07-06 2021-10-08 南通创越时空数据科技有限公司 Ground feature edge detection method of high-precision semantic segmentation algorithm U2-net

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9508164B2 (en) * 2013-01-23 2016-11-29 Czestochowa University Of Technology Fast iterative image reconstruction method for 3D computed tomography
US9816948B2 (en) * 2014-04-18 2017-11-14 University Of Georgia Research Foundation, Inc. Computerized tomography detection of microbial damage of plant tissues
US20190216409A1 (en) * 2018-01-15 2019-07-18 Siemens Healthcare Gmbh Method and System for 3D Reconstruction of X-ray CT Volume and Segmentation Mask from a Few X-ray Radiographs
US10475214B2 (en) * 2017-04-05 2019-11-12 General Electric Company Tomographic reconstruction based on deep learning
US10489939B2 (en) * 2015-09-09 2019-11-26 Tsinghua University Spectral CT image reconstructing method and spectral CT imaging system
US20190384963A1 (en) * 2016-12-01 2019-12-19 Berkeley Lights, Inc. Automated detection and repositioning of micro-objects in microfluidic devices
US10628973B2 (en) * 2017-01-06 2020-04-21 General Electric Company Hierarchical tomographic reconstruction
US10685429B2 (en) * 2017-02-22 2020-06-16 Siemens Healthcare Gmbh Denoising medical images by learning sparse image representations with a deep unfolding approach
US10733745B2 (en) * 2019-01-07 2020-08-04 The University Of North Carolina At Chapel Hill Methods, systems, and computer readable media for deriving a three-dimensional (3D) textured surface from endoscopic video

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102094598B1 (en) * 2018-05-29 2020-03-27 한국과학기술원 Method for processing sparse-view computed tomography image using artificial neural network and apparatus therefor

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9508164B2 (en) * 2013-01-23 2016-11-29 Czestochowa University Of Technology Fast iterative image reconstruction method for 3D computed tomography
US9816948B2 (en) * 2014-04-18 2017-11-14 University Of Georgia Research Foundation, Inc. Computerized tomography detection of microbial damage of plant tissues
US10489939B2 (en) * 2015-09-09 2019-11-26 Tsinghua University Spectral CT image reconstructing method and spectral CT imaging system
US20190384963A1 (en) * 2016-12-01 2019-12-19 Berkeley Lights, Inc. Automated detection and repositioning of micro-objects in microfluidic devices
US10628973B2 (en) * 2017-01-06 2020-04-21 General Electric Company Hierarchical tomographic reconstruction
US10685429B2 (en) * 2017-02-22 2020-06-16 Siemens Healthcare Gmbh Denoising medical images by learning sparse image representations with a deep unfolding approach
US10475214B2 (en) * 2017-04-05 2019-11-12 General Electric Company Tomographic reconstruction based on deep learning
US20190216409A1 (en) * 2018-01-15 2019-07-18 Siemens Healthcare Gmbh Method and System for 3D Reconstruction of X-ray CT Volume and Segmentation Mask from a Few X-ray Radiographs
US10733745B2 (en) * 2019-01-07 2020-08-04 The University Of North Carolina At Chapel Hill Methods, systems, and computer readable media for deriving a three-dimensional (3D) textured surface from endoscopic video

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Kim USPAP 2019/0384,963 *
Zhou USPAP 2019/0216,409 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10991132B2 (en) * 2018-05-29 2021-04-27 Korea Advanced Institute Of Science And Technology Method for processing sparse-view computed tomography image using neural network and apparatus therefor
US20200196972A1 (en) * 2018-12-20 2020-06-25 Canon Medical Systems Corporation Apparatus and method that uses deep learning to correct computed tomography (ct) with sinogram completion of projection data
US11039806B2 (en) * 2018-12-20 2021-06-22 Canon Medical Systems Corporation Apparatus and method that uses deep learning to correct computed tomography (CT) with sinogram completion of projection data
US11864939B2 (en) 2018-12-20 2024-01-09 Canon Medical Systems Corporation Apparatus and method that uses deep learning to correct computed tomography (CT) with sinogram completion of projection data
US20200196973A1 (en) * 2018-12-21 2020-06-25 Canon Medical Systems Corporation Apparatus and method for dual-energy computed tomography (ct) image reconstruction using sparse kvp-switching and deep learning
US10945695B2 (en) * 2018-12-21 2021-03-16 Canon Medical Systems Corporation Apparatus and method for dual-energy computed tomography (CT) image reconstruction using sparse kVp-switching and deep learning
CN111028512A (en) * 2019-12-31 2020-04-17 福建工程学院 Real-time traffic prediction method and device based on sparse BP neural network
CN110874828A (en) * 2020-01-20 2020-03-10 上海尽星生物科技有限责任公司 Neural network model and ultrasonic beam forming method based on neural network model
WO2021159234A1 (en) * 2020-02-10 2021-08-19 深圳先进技术研究院 Image processing method and apparatus, and computer-readable storage medium
WO2021182103A1 (en) * 2020-03-11 2021-09-16 国立大学法人筑波大学 Trained model generation program, image generation program, trained model generation device, image generation device, trained model generation method, and image generation method
WO2021151272A1 (en) * 2020-05-20 2021-08-05 平安科技(深圳)有限公司 Method and apparatus for cell image segmentation, and electronic device and readable storage medium
US20210383582A1 (en) * 2020-06-08 2021-12-09 GE Precision Healthcare LLC Systems and methods for a stationary ct imaging system
US11696733B2 (en) * 2020-06-08 2023-07-11 GE Precision Healthcare LLC Systems and methods for a stationary CT imaging system
US20220122235A1 (en) * 2020-10-16 2022-04-21 Microsoft Technology Licensing, Llc Dual-Stage System for Computational Photography, and Technique for Training Same
US11669943B2 (en) * 2020-10-16 2023-06-06 Microsoft Technology Licensing, Llc Dual-stage system for computational photography, and technique for training same
US20220240879A1 (en) * 2021-02-01 2022-08-04 Medtronic Navigation, Inc. Systems and methods for low-dose ai-based imaging
US11890124B2 (en) * 2021-02-01 2024-02-06 Medtronic Navigation, Inc. Systems and methods for low-dose AI-based imaging
CN114494482A (en) * 2021-12-24 2022-05-13 中国人民解放军总医院第一医学中心 Method for generating CT blood vessel imaging based on flat scanning CT

Also Published As

Publication number Publication date
KR20190135616A (en) 2019-12-09
KR102094598B1 (en) 2020-03-27
US10991132B2 (en) 2021-04-27

Similar Documents

Publication Publication Date Title
US10991132B2 (en) Method for processing sparse-view computed tomography image using neural network and apparatus therefor
Han et al. Framing U-Net via deep convolutional framelets: Application to sparse-view CT
US10853977B2 (en) Apparatus and method for reconstructing image using extended neural network
Ye et al. Deep convolutional framelets: A general deep learning framework for inverse problems
KR102089151B1 (en) Method and apparatus for reconstructing image based on neural network
Kang et al. A deep convolutional neural network using directional wavelets for low‐dose X‐ray CT reconstruction
Selesnick et al. Signal restoration with overcomplete wavelet transforms: Comparison of analysis and synthesis priors
Onuki et al. Graph signal denoising via trilateral filter on graph spectral domain
KR101961177B1 (en) Method and apparatus for processing image based on neural network
US11250600B2 (en) Method for processing X-ray computed tomography image using neural network and apparatus therefor
Ye et al. Deep back projection for sparse-view CT reconstruction
KR102094599B1 (en) Method for processing interior computed tomography image using artificial neural network and apparatus therefor
Grigoryan et al. Optimal Wiener and homomorphic filtration
US11145028B2 (en) Image processing apparatus using neural network and method performed by image processing apparatus
KR102061967B1 (en) Method for processing x-ray computed tomography image using artificial neural network and apparatus therefor
Palakkal et al. Poisson image denoising using fast discrete curvelet transform and wave atom
Pfister et al. Model-based iterative tomographic reconstruction with adaptive sparsifying transforms
Mohsin et al. Iterative shrinkage algorithm for patch-smoothness regularized medical image recovery
Deka et al. Removal of correlated speckle noise using sparse and overcomplete representations
Shen et al. Removal of mixed Gaussian and impulse noise using directional tensor product complex tight framelets
He et al. Wavelet frame-based image restoration using sparsity, nonlocal, and support prior of frame coefficients
Jiao et al. Low-dose CT image denoising via frequency division and encoder-dual decoder GAN
US20220414954A1 (en) Method and apparatus for low-dose x-ray computed tomography image processing based on efficient unsupervised learning using invertible neural network
Grohs et al. A shearlet-based fast thresholded Landweber algorithm for deconvolution
KR102329938B1 (en) Method for processing conebeam computed tomography image using artificial neural network and apparatus therefor

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

AS Assignment

Owner name: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YE, JONGCHUL;HAN, YOSEOB;REEL/FRAME:048725/0648

Effective date: 20190322

Owner name: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY, KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YE, JONGCHUL;HAN, YOSEOB;REEL/FRAME:048725/0648

Effective date: 20190322

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE