US12346997B2

US12346997B2 - Method and apparatus for low-dose X-ray computed tomography image processing based on efficient unsupervised learning using invertible neural network

Info

Publication number: US12346997B2
Application number: US17/848,689
Authority: US
Inventors: JongChul YE; Taesung Kwon
Original assignee: Korea Advanced Institute of Science and Technology KAIST
Current assignee: Korea Advanced Institute of Science and Technology KAIST
Priority date: 2021-06-28
Filing date: 2022-06-24
Publication date: 2025-07-01
Also published as: US20220414954A1

Abstract

Disclosed are a method and apparatus for processing a low-dose X-ray computed tomography image based on efficient unsupervised learning by using an invertible neural network. The method of processing a low-dose X-ray computed tomography image based on unsupervised learning by using an invertible neural network performed by a computer device includes providing an invertible generator for restoring an image, and training the invertible generator to restore a low-dose computed tomography image to a normal computed tomography image.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

A claim for priority under 35 U.S.C. § 119 is made to Korean Patent Application No. 10-2021-0084256 filed on Jun. 28, 2021 and Korean Patent Application No. 10-2021-0130686 filed on Oct. 1, 2021, in the Korean Intellectual Property Office, the entire contents of which are hereby incorporated by reference.

BACKGROUND

Embodiments of the inventive concept described herein relate to neural network-based low-dose X-ray computed tomography image restoration technology, and more particularly, relate to a method and apparatus for processing a low-dose X-ray computed tomography image based on efficient unsupervised learning by using an invertible neural network.

X-ray computed tomography (CT) is one of the most commonly used medical imaging schemes because of the advantages of high-resolution imaging within a short scan time. However, because excessive X-ray radiation may potentially increase the incidence of cancer, low-dose CT scans have been extensively studied to minimize radiation dose to patients. Unfortunately, various artifacts appear in the low-dose CT images, so that the diagnostic values may be greatly reduced.

Recently, a deep learning approach for low-dose CT noise removal with improved performance has been proposed. Most of these tasks are based on supervised learning, where a neural network learns paired low-dose CT (LDCT) and standard-dose CT (SDCT) images. However, it is occasionally difficult to simultaneously acquire the low-dose and high-dose images because radiation exposure to a subject is increased.

Therefore, unsupervised learning approaches that do not require matched LDCT and SDCT images have become one major research of the CT community. In particular, there has been proposed a cycle-consistency generative adversarial network (CycleGAN) approach for removing low-dose CT noise that trains networks with unpaired LDCT and SDCT images. To enable such non-pair learning, there are required two generators: one for forward mapping from LDCT to SDCT and one for backward mapping from SDCT to LDCT. Then, cycle coherence is applied such that the image that has undergone successive forward and backward mapping applications returns to the original image. Actually, recent theoretical studies show that the CycleGAN architecture appears as a double formulation of the optimal transmission problem in which the statistical distances of empirical and transmitted measurements in both the source and target domains are simultaneously minimized.

Two generators are required for learning, but only the forward generator is used for inference. Nevertheless, the CycleGAN architecture is inefficient because the reverse mapping generator requires learnable parameters and memory similar to the forward mapping. Furthermore, it is necessary to learn two generators and two discriminators at the same time for fusion, which requires high-level learning skills and know-how.

In order to alleviate this problem, Gu et al. (Non-Patent Document 1) proposed a switchable CycleGAN through adaptive instance normalization (AdaIN). The main idea is that a single generator can be switched into a forward or reverse generator by simply changing the AdaIN code generated by the lightweight AdaIN code generator. However, the switchable CycleGAN architecture still needs two discriminators to distinguish fake and real samples for the LDCT and SDCT domains, and the individual complexity is still as high as the generator.

RELATED ART LITERATURE Non-Patent Literature

- (Non-Patent Document 1) J. Gu and J. C. Ye, “AdaIN-based tunable CycleGAN for efficient unsupervised low-dose CT denoising,” IEEE Transactions on Computational Imaging, vol. 7, pp. 73-85, 2021.
- (Non-Patent Document 2) L. Dinh, D. Krueger, and Y. Bengio, “NICE: Non-linear independent components estimation,” arXiv preprint arXiv:1410.8516, 2014.
- (Non-Patent Document 3) J. Su and G. Wu, “f-VAEs: Improve VAEs with conditional flows,” arXiv preprint arXiv:1809.05861, 2018.
- (Non-Patent Document 4) E. Cha, H. Chung, E. Y. Kim, and J. C. Ye, “Unpaired training of deep learning tMRA for flexible spatio-temporal resolution,” IEEE Transactions on Medical Imaging, vol. 40, no. 1, pp. 166-179, 2021.
- (Non-Patent Document 5) D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible lxi convolutions,” in NeurIPS, 2018.

SUMMARY

Embodiments of the inventive concept provide a method and apparatus for processing a low-dose X-ray computed tomography image based on efficient unsupervised learning by using an invertible neural network. More specifically, embodiments of the inventive concept provide a technology that provides an invertible generator for restoring an image from an image, and applies the invertible generator to a cycle-consistency generative adversarial network (CycleGAN) to omit the process of reverting from a normal computed tomography image to a low-dose computed tomography image.

Embodiments of the inventive concept provide a method and apparatus for processing a low-dose X-ray computed tomography image based on efficient unsupervised learning by using an invertible neural network, which can provide an invertible generator and apply the invertible generator to CycleGAN to learn even with unmatched data and to perform unsupervised learning with only two neural networks, thereby effectively improving the image quality of a low-dose computed tomography reconstructed image and using the number of learnable parameters by 1/10 (one tenth) than that of the related art.

According to an exemplary embodiment, a method of processing a low-dose X-ray computed tomography image based on unsupervised learning by using an invertible neural network performed by a computer device includes providing an invertible generator for restoring an image, and training the invertible generator to restore a low-dose computed tomography image to a normal computed tomography image.

The method may further include improving a quality of the low-dose X-ray computed tomography image based on the unsupervised learning by using a single generator neural network and a single separator neural network by providing the invertible generator.

The providing of the invertible generator may include providing an invertible block including a coupling layer.

The method may further include allowing the invertible generator to learn a distribution of an image in an invertible operation through the coupling layer.

The providing of the invertible generator may include performing a squeeze operation and an unsqueeze operation on an input image; and performing an invertible operation through an invertible block between the squeeze operation and the unsqueeze operation.

The invertible block may include a coupling layer coupled to an invertible 1×1 convolution stably and additionally.

The training of the invertible generator may include simultaneously performing, by reverse of the invertible generator, a function of reversely returning the normal computed tomography image to the low-dose computed tomography image when the invertible generator is trained to restore the low-dose computed tomography image to the normal computed tomography image.

The training of the invertible generator may include allowing the invertible generator to learn using a wavelet residual image and obtain a final image by excluding a noise pattern after obtaining the noise pattern.

According to an exemplary embodiment, an apparatus for processing a low-dose X-ray computed tomography image based on unsupervised learning by using an invertible neural network includes an invertible generator providing unit configured to provide an invertible generator for restoring an image, and a learning device configured to train the invertible generator to restore from a low-dose computed tomography image to a normal computed tomography image.

A quality of the low-dose X-ray computed tomography image based on the unsupervised learning may be improved by using a single generator neural network and a single separator neural network by providing the invertible generator.

The invertible generator providing unit may provide an invertible block including a coupling layer.

The invertible generator may learn a distribution of an image in an invertible operation through the coupling layer.

The invertible generator providing unit may perform a squeeze operation and an unsqueeze operation on an input image; and performs an invertible operation through an invertible block between the squeeze operation and the unsqueeze operation.

The invertible block may include a coupling layer capable of being coupled to an invertible 1×1 convolution stably and additionally.

The learning device may simultaneously performs a function of reversely returning the normal computed tomography image to the low-dose computed tomography image by reverse of the invertible generator when the invertible generator is trained to restore the low-dose computed tomography image to the normal computed tomography image.

The learning device may allow the invertible generator to learn using a wavelet residual image and obtain a final image by excluding a noise pattern after obtaining the noise pattern.

According to an exemplary embodiment, an invertible neural network that is used in an apparatus for processing a low-dose X-ray computed tomography image based on unsupervised learning includes an invertible block including a coupling layer for restoring an image, wherein the invertible neural network performs a squeeze operation and an unsqueeze operation on an input image and performs an invertible operation through an invertible block between the squeeze operation and the unsqueeze operation.

The invertible block may include the coupling layer capable of being coupled to an invertible 1×1 convolution stably and additionally.

BRIEF DESCRIPTION OF THE FIGURES

The above and other objects and features will become apparent from the following description with reference to the following figures, wherein like reference numerals refer to like parts throughout the various figures unless otherwise specified, and wherein:

FIG. 1 is a view illustrating a general CycleGAN-based low-dose computed tomography image restoration technology;

FIG. 2 is a diagram illustrating a learning process of CycleGAN learning technology using an invertible generator according to an embodiment;

FIG. 3 is a flowchart illustrating an unsupervised learning-based low-dose X-ray computed tomography image processing method using an invertible neural network according to an embodiment;

FIG. 4 is a block diagram illustrating an unsupervised learning-based low-dose X-ray computed tomography image processing apparatus using an invertible neural network according to an embodiment;

FIG. 5 is a diagram illustrating the analysis of conventional optimal transmission-based CycleGAN learning;

FIG. 6 is a diagram illustrating an interpretation of learning of cycleGAN learning technology using an optimal transmission-based invertible generator according to an embodiment;

FIG. 7 is a diagram illustrating an architecture of an invertible block and an invertible generator according to an embodiment;

FIG. 8 is a diagram illustrating a squeeze operation and an unsqueeze operation according to an embodiment;

FIG. 9 is a diagram illustrating an invertible 1×1 convolution and vice versa according to an embodiment;

FIG. 10 is a diagram illustrating a method of forward calculation of a coupling layer handling an image according to an embodiment;

FIG. 11 is a diagram illustrating a method of inverse calculation of a coupling layer handling an image according to an embodiment;

FIG. 12 is a diagram illustrating a method of generating a wavelet residual image according to an embodiment;

FIG. 13 is a diagram illustrating a network trained using a wavelet residual image according to an embodiment;

FIG. 14 is a diagram illustrating the architecture of a neural network in a coupling layer according to an embodiment;

FIG. 15 is a diagram illustrating the architecture of a PatchGAN discriminator according to an embodiment;

FIG. 16 is a diagram illustrating the noise removal result of a low-dose computed tomography image of a conventional recurrent generative adversarial neural network and the proposed method;

FIG. 17 is a diagram illustrating a result of confirming whether an invertible generator performs an appropriate reverse operation according to an embodiment; and

FIG. 18 is a diagram illustrating a noise removal result according to an embodiment.

DETAILED DESCRIPTION

The present invention will be described in detail below with reference to the accompanying drawings. Repeated descriptions and descriptions of known functions and configurations that have been deemed to make the gist of the present invention unnecessarily obscure will be omitted below. The embodiments of the present invention are intended to fully describe the present invention to a person having ordinary knowledge in the art to which the present invention pertains. Accordingly, the shapes, sizes, etc. of components in the drawings may be exaggerated to make the description clearer.

Low-dose computed tomography (X-ray CT) may reduce the risk of cancer in patients by reducing the radiation dose of conventional X-ray computed tomography. However, in the low-dose computed tomography image, some information is lost or signal noise is included so that the image quality is very deteriorated.

Recently, CycleGAN has been shown to provide high-performance, ultra-fast noise removal for low-dose X-ray computed tomography (CT) without a paired training dataset. The CycleGAN is possible because cycle coherence is guaranteed, but the CycleGAN requires two generators and two discriminators to apply cycle coherence while requiring significant GPU resources and finesse for learning. A recent proposal of a switchable CycleGAN with adaptive instance normalization (AdaIN) partially alleviates the problem by using a single generator. However, two discriminators and an additional AdaIN code generator for learning are still required. In order to solve such a problem, the present embodiment proposes a new cycle-free CycleGAN architecture that includes a single generator and a discriminator but still guarantees cycle consistency.

The cycle-free CycleGAN comes from the observation that the cycle consistency condition is automatically met and an additional discriminator is removed from the CycleGAN formula when the invertible generator is used. In order to increase the efficiency of the invertible generator, the network is implemented in the wavelet residual domain. According to embodiments, it may be understood that the cycle-free CycleGAN can significantly improve the noise removal performance by using only 10% of the learnable parameters compared to a conventional CycleGAN through extensive experiments using low-dose CT images of various levels.

The existing low-dose X-ray computed tomography image processing method is based on unsupervised learning that requires a supervised learning-based neural network that requires matched data or four or more neural networks that can be learned even with unmatched data. Meanwhile, the low-dose X-ray computed tomography image processing method according to an embodiment is a technique capable of learning with unmatched data by developing an invertible generator and applying the invertible generator to the CycleGAN, and unsupervised-learning with only two neural networks. The cycle-free CycleGAN will be described in more detail below.

FIG. 1 is a view illustrating a general CycleGAN-based low-dose computed tomography image restoration technology.

Referring to FIG. 1 , the CycleGAN has an inefficient structure that uses a total of four neural networks of a generator neural network and a discriminator neural network that restore a normal computed tomography image from a low-dose computed tomography image, and a generator neural network and a discriminator neural network that restore the low-dose computed tomography image again from the normal computed tomography image.

One of the final goals of CycleGAN research for low-dose CT noise removal is to remove unnecessary generators and delimiters while maintaining the optimality of CycleGAN in terms of optimal transmission. Indeed, one of the most important contributions of the present invention is to show that the use of an invertible generator architecture allows to completely eliminate one of the delimiters, automatically meeting cycle consistency without affecting the CycleGAN framework. That is, in this embodiment, an invertible generator that restores an image from an image is provided and applied to CycleGAN, a technique capable of omitting the process of returning from a normal computed tomography image to a low-dose computed tomography image is proposed. Meanwhile, the term ‘image’ used below may be used as an image including an image.

FIG. 2 is a diagram illustrating a learning process of CycleGAN learning technology using an invertible generator according to an embodiment.

Referring to FIG. 2 , when the invertible generator is trained to restore from a low-dose computed tomography image to a normal computed tomography image, the reverse of the invertible generator may perform a function of returning the normal computed tomography image to the low-dose computed tomography image. This may also be understood in an optimal transport-based interpretation.

To satisfy the reversibility condition, a generator according to an embodiment may be implemented using an originally proposed coupling layer for a normalization flow. Then, the generator according to an embodiment is trained with a single discriminator for distinguishing the fake SDCT from the real SDCT image. The invertible generator may learn a network by using a wavelet residual domain so that the invertible generator is sufficiently implemented for low-dose CT noise removal. Despite the absence of explicit cycle consistency, the algorithm according to an embodiment maintains the optimality of CycleGAN and provides cutting-edge noise removal technology with only 10% of the learnable parameters compared to existing CycleGAN. Because there is no explicit cycle consistency, the method according to the present embodiment will be referred to herein as cycle-free CycleGAN.

FIG. 3 is a flowchart illustrating an unsupervised learning-based low-dose X-ray computed tomography image processing method using an invertible neural network according to an embodiment.

Referring to FIG. 3 , an unsupervised learning-based low-dose X-ray computed tomography image processing method using an invertible neural network performed by a computer device according to an embodiment may include operation S310 of providing an invertible generator for restoring an image, and operation S320 of training the invertible generator to restore from a low-dose computed tomography image to a normal computed tomography image.

As described above, according to embodiments, an invertible generator may be provided such that it is possible to improve the quality of an unsupervised learning-based low-dose X-ray computed tomography image by using a single generator neural network and a single discriminator neural network.

According to embodiments, by using a scheme of omitting an unnecessary process in unsupervised learning by using an invertible neural network to improve the quality of low-dose X-ray computed tomography images, it is possible to reduce the computational complexity to 1/10 compared to the image quality improvement technique of actual existing unsupervised learning-based low-dose X-ray computed tomography (CT) images.

An unsupervised learning-based low-dose X-ray computed tomography image processing method using an invertible neural network according to an embodiment may be described using an unsupervised learning-based low-dose X-ray computed tomography image processing apparatus using an invertible neural network according to an embodiment as an example.

FIG. 4 is a block diagram illustrating an unsupervised learning-based low-dose X-ray computed tomography image processing apparatus using an invertible neural network according to an embodiment.

Referring to FIG. 4 , an unsupervised learning-based low-dose X-ray computed tomography image processing apparatus 400 using an invertible neural network according to an embodiment may include an invertible generator providing unit 410 and a learning unit 420.

In operation S310, the invertible generator providing unit 410 may provide an invertible generator for restoring an image. As the embodiments provide an invertible generator, the quality of an unsupervised learning-based low-dose X-ray computed tomography image may be improved by using a single generator neural network and a single discriminator neural network.

The invertible generator providing unit 410 may provide an invertible block including a coupling layer. In this case, through the coupling layer, the invertible generator may learn the image distribution in the invertible operation.

In particular, the invertible generator providing unit 410 may perform a squeeze operation and an unsqueeze operation on an input image, and perform an invertible operation through an invertible block between the squeeze operation and the unsqueeze operation. In this case, the invertible block may include a coupling layer capable of being further combined stably with an invertible 1×1 convolution.

In operation S320, the learning unit 420 may train the invertible generator to restore from the low-dose computed tomography image to the normal computed tomography image. In this case, when the learning unit 420 trains the invertible generator to restore from the low-dose computed tomography image to the normal computed tomography image, at the same time, the inverse of the invertible generator may perform a function of returning the normal computed tomography image to the low-dose computed tomography image again.

The learning unit 420 may obtain a final image by excluding a noise pattern after the invertible generator learns by using the wavelet residual image and obtains the noise pattern.

As described above, according to embodiments, there may be provided a technique capable of omitting an operation (two neural networks) of providing an invertible generator neural network that restores an image from an image and returning a normal computed tomography image to a low-dose computed tomography image by applying it to CycleGAN. According to embodiments, it is possible to effectively improve the quality of the restored low-dose computed tomography image, and to reduce the number of learnable parameters required for learning to 1/10 (one tenth) than that of the related art.

Hereinafter, an unsupervised learning-based low-dose X-ray computed tomography image processing method and apparatus using an invertible neural network according to embodiments will be described in more detail.

Normalizing Flow

Inspired by a normalizing flow (NF) or an invertible flow, a method according to an embodiment is briefly reviewed to highlight similarities with and differences from embodiments. Because it is difficult to describe the relationship between the initial derivation of the normalizing flow (Non-Patent Document 2) and cycle-free CycleGAN, the present disclosure proposes a new derivation inspired by f-VAE (Non-Patent Document 3).

First, it is assumed that X and Z represent a surrounding space and a latent space, respectively. In classical variation inference, model distribution pθ(x)ϵX may be obtained by combining latent spatial distribution p(z), zϵZ with a series of conditional distributions pθ(x|z), which may lead to an interesting lower bound and may be expressed as following Equation 1 and Equation 2.
[Equation 1]
log p _θ(x)=log(∫p _θ(x|z)p(z)dz)≥−l _ELBO(x;θ,ϕ)
[Equation 2]
:=−∫ log p _θ(x|z)q _ϕ(z|x)dz+D _KL(q _ϕ(z|x)∥p(z))

Wherein D_KL(q∥IP) denotes the Kullback-Leibler (KL) divergence. The lower bound of Equation 1 is called an evidence lower bound (ELBO) or a variation lower bound. Then, the goal of the variance inference is to find θ and posterior q_ϕ(z|x) which maximize the lower bound or minimize the lower bound of ELBO.

Among the various selections of the posterior q_ϕ(z|x) for ELBO, the posterior is the most frequently used and expressed as following Equation 3 (Non-Patent Document 3).
[Equation 3]
q _ϕ(z|z)=∫δ(z−F _ϕ ^u(x))r(u)du (3)

Where r(u) is a zero-mean unit variance Gaussian, and F_ϕ ^uxϵ

is an encoder function parameterized by Φ for input xϵX specified in addition to noise u. In the case of the designated encoder of Equation 3, the first term of Equation 2 may be simplified according to Non-Patent Document 3 and may be expressed as following Equation 4.

\begin{matrix} [Equation 4] &  \\ = \int \frac{*}{2} { x - G_{θ} (F_{ϕ}^{u} (x)) }^{2} r (u) du & (4) \end{matrix}

Where G_θ:

is a decoder function parameterized by θ.

Therefore, Equation 2 is actually the sum of the distances in ambient and latent space. In particular, the first term represents distance l₂of X, while the second term is a KL divergent term representing the statistical distance of Z. Equation 3 may be further simplified as the following equation.

D_{KL} (q_{ϕ} (z ❘ x)  p (z)) = \int \log (\frac{r (u)}{p (F_{ϕ}^{u} (x))}) r (u) du - \int \log ❘ \det (\frac{\partial F_{ϕ}^{u} (x)}{\partial u}) ❘ r (u) du

In addition, the VAE selects an encoder function of the following form, which is called a reparametrization trick.
[Equation 5]
F _ϕ ^u(x)=F _ϕ(σu+x) (5)

Where σ is a noise standard deviation. Then, the normalizing flow may further apply F_Φ as an invertible function.
[Equation 6]
G _θ =F _ϕ ⁻¹. (6)

A very interesting phenomenon occurs due to the reversibility condition of Equation 6. More specifically, the surrounding spatial distance of Equation 4 may be simplified as follows.

\begin{matrix} [Equation 7] &  \\ \frac{1}{2} \int { x - G_{θ} (F_{ϕ}^{u} (x)) }^{2} r (u) du = \frac{1}{2} \int { x - G_{θ} (F_{ϕ} (σ u + x)) }^{2} r (u) du = \frac{1}{2} \int { σ u }^{2} r (u) du = \frac{σ^{2}}{2} & (7) \end{matrix}

It becomes a constant. Therefore, the decoder part is no longer needed for parameter estimation, and the simultaneous distance minimization problem of the normalizing flow (NF) becomes a single distance minimization problem.

Therefore, the ELBO loss of Equation 2 may be replaced by the following.

\begin{matrix} [Equation 8] &  \\ ℓ_{flow} (x, ϕ) := - \int \log (p (F_{ϕ}^{u} (x))) r (u) du - \int \log ❘ \det (\frac{\partial F_{ϕ}^{u} (x)}{\partial u}) ❘ r (u) du & (8) \end{matrix}

Where the term ∫log r(u)r(u)du is a constant and thus removed. Equation 8 may be further simplified as follows when a zero mean unit dispersion Gaussian measurement for the latent space Z is additionally assumed.

\begin{matrix} [Equation 9] &  \\ ℓ_{flow} (x, ϕ) = \frac{1}{2} \int { F_{ϕ} (σ u + x) }^{2} r (u) du - \int \log ❘ \det (\frac{\partial F_{ϕ} (σ u + x)}{\partial u}) ❘ r (u) du & (9) \end{matrix}

This is the final loss function of the normalizing flow (NF).

Now, the main technical difficulty of minimizing the loss function in Equation 9 occurs from the last term related to the computation of complex determinants for large matrices. In addition to the invertible network architecture that satisfies Equation 6, the normalizing flow focuses on the encoder function F_Φ including the following transform sequence.
[Equation 10]
F _ϕ(u)=(h _K ·h _K-1 · . . . ·h ₁)(u) (10)

For the encoder function, a change in the variable formula follows.

\begin{matrix} [Equation 11] &  \\ \log ❘ \det (\frac{\partial F_{ϕ} (u)}{\partial u}) ❘ = \sum_{i = 1}^{K} \log ❘ \det (\frac{\partial h_{i}}{\partial h_{i - 1}}) ❘ & (11) \end{matrix}

Where h0=u. The complicated matrix calculation of Equation 9 may be replaced with a relatively easy calculation for each step (Non-Patent Document 2).

The cycle-free CycleGAN according to an embodiment is derived below.

FIG. 5 is a diagram illustrating the analysis of conventional optimal transmission-based CycleGAN learning.

As shown in FIG. 5 , it may be understood that CycleGAN learning includes a G generator and a discriminator transmitted from the Y distribution to the X distribution, and an F generator and discriminator transmitted from the X distribution to the Y distribution.

Similar to the normalization flow that considers the transformation between the latent space Z and the surrounding space X for image generation, the main goal of CycleGAN is the image transfer between two spaces, such as X and Y.

In particular, in the case of low-dose CT noise removal, the probability measurement μ is applied to the target SDCT image space X, while the probability measurement ν is applied to the LDCT image space Y. Then, the goal of CycleGAN is to transfer the LDCT distribution to the SDCT distribution μ so that the LDCT distribution follows the SDCT distribution. It is understood that this is closely related to the optimal transmission.

In particular, because the transmission from (X, μ) to (Y, ν) is performed by the forward operator F_Φ, F_Φpushes the measurement μ of X forward into ν_Φ of space Y. Meanwhile, the large-scale transfer from one measurement space (Y, ν) to another measurement space (X, μ) are performed by the generator G_θ:

. That is, the generator G_θpushes the measurement ν of Y into the measurement μ_θin the target space X. Then, it is possible to achieve an optimal transmission map for unsupervised learning by simultaneously minimizing the statistical distance dist(μ, μ_θ) between μ and μ_θ and dist(ν, ν_Φ) between ν and ν_Φ.

In contrast to the original VAE and normalizing flow (NF) which use the distance 12 and the divergence KL as the distances in the surrounding and latent spaces, respectively, the embodiment uses Wasserstein measurements as statistical distances inspired by previous studies. Then, simultaneous statistical distance minimization may be achieved by solving the following Kantorovich optimal transmission problem.

\begin{matrix} [Equation 12] &  \\ \inf_{πϵ11 | μ, v)} \int_{X \times Y} c (x, y; G_{θ}, F_{ϕ}) d π (x, y) & (12) \end{matrix}

Where Π(μ,ν) is a set of common distributions defining the transmission cost together with the margins μ and ν.

\begin{matrix} [Equation 13] &  \\ c (x, y; G_{θ}, F_{ϕ}) =  x - G_{θ} (y)  + \frac{1}{β}  F_{ϕ} (x) - y  & (13) \end{matrix}

Where β>0 represents some weighting parameters. In particular, the function of β in Equation 13 was studied in the context of β-CycleGAN.

Additional normalization is often used in many inverse problems. For example, following Non-Patent Document 4 may be used.

\begin{matrix} [Equation 14] &  \\ c (x, y; G_{θ} F_{ϕ}) =  x - G_{θ} (y)  + \frac{1}{β}  F_{ϕ} (x) - y  + η  y - G_{θ} (y)  & (14) \end{matrix}

Where η>0 is a normalization parameter and the last term makes the variance by the generator disadvantageous. The first two terms in Equation 14 are calculated using both x and y, but the last term is calculated only for y. In terms of the optimal transfer, this makes a great difference. This is because the first term requires a double formula, while the calculation of the last term is negligible.

Actually, the transmission cost Equation 14 in the original formula Equation 12 of unsupervised learning may be expressed as follows.

\begin{matrix} [Equation 15] &  \\ \min_{θ, ϕ} \max_{ψ, φ} ℓ (G_{θ}, F_{ϕ}; ψ, φ) & (15) \end{matrix}

Where
l(G _θ ,F _ϕ:ψ,φ)
:=λl _cycle(G _θ ,F _ϕ)+l _GAN(G _θ ,F _ϕ:ψ,φ)+ηly(G _θ)

Where λ>0 is a hyper-parameter, and the cycle consistency is given as follows.

\begin{matrix} [Equation 16] &  \\ ℓ_{cycle} (G_{θ}, F_{ϕ}) = \int_{𝓍}  x - G_{θ} (F_{ϕ} (x))  d μ (x) + \frac{1}{β} \int_{𝓎}  y - F_{ϕ} (G_{θ} (y))  d ν (y) & (16) \end{matrix}

\begin{matrix} [Equation 17] &  \\ ℓ_{GAN} (G_{θ}, F_{ϕ}; ψ, φ) = \max_{{ρϵ L}^{2} (ϰ)} \int_{𝓍} (φ (x) d μ (x) - \int_{𝓎} φ (G_{θ} (y)) d ν (y) + \max_{{υϵ L}^{\frac{1}{β}} (y)} \int_{𝓎} ψ (y) d ν (y) - \int_{𝓍} ψ (F_{ϕ} (x)) d μ (x) & (17) \end{matrix}

Where L^κ(D) denotes space κ-Lipschitz in domain D.
[Equation 18]
ly(G _θ)=∫∥y−G _θ(y)∥dν(y) (18)

Similar to the main simplification operation Equation 7 of the normalizing flow (NF), it is very interest to use the invertible generator of Equation 6 for CycleGAN learning. The following suggestion is a key result according to an embodiment.

Proposition 1: Assume that the generator is invertible. That is, G_θ=F_ϕ ⁻¹and F_Φ is the κ-Lipschitz function. Then, the CycleGAN problem of Equation 15 for the transmission cost given by Equation 14 with β=k may be expressed as follows.

\begin{matrix} [Equation 19] &  \\ \min_{θ} \max_{φ} ℓ (G_{θ}; φ) & (19) \end{matrix}

Where
[Equation 20]
l(G _θ:φ):=2l _GAN(G _θ:φ)+ηly(G _θ) (20)

Where l_y(G_θ) is defined in Equation 18.

ℓ_{GAN} (G_{θ}; φ) = \max_{{φϵ L}^{1} (ϰ)} \int_{𝓍} φ (x) d μ (x) - \int_{𝓎} φ (G_{θ} (y)) d ν (y)

Proof: First, because the reversibility condition of Equation 6 means F_ϕ(G_θ(y))=y and G_θ(F_ϕ(x))=x, it may be easily understood that l_cycle(G_θ,F_ϕ) in Equation 16 disappears. Second, the following equation may be obtained due to the reversibility condition of Equation 6.

\begin{matrix} [Equation 21] &  \\ \max_{{υϵ L}^{\frac{1}{φ}} (y)} \int_{𝓎} ψ (y) d ν (y) - \int_{𝓍} ψ (F_{ϕ} (x)) d μ (x) = \max_{{υϵ L}^{\frac{1}{β}} (y)} \int_{𝓍} ψ (F_{ϕ} (G_{θ} (y))) d ν (y) - \int_{𝓍} ψ (F_{ϕ} (x)) d μ (x) = \max_{φ_{ϕ}^{'} ϵΦ} \int_{𝓎} φ_{ϕ}^{'} (G_{θ} (y)) d ν (y) - \int_{𝓍} φ_{ϕ}^{'} (x) d μ (x) = \max_{φ_{ϕ}^{} ϵΦ} \int_{𝓍} φ_{ϕ} (x) d μ (x) - \int_{𝓎} φ_{ϕ} (G_{θ} (y)) d ν (y) & (21) \end{matrix}

Where the setting Φ is defined as follows.

\begin{matrix} [Equation 22] &  \\ Φ = {φ ❘ φ - ψ \circ F_{ϕ}, ψ \in L^{\frac{1}{β}} (y)} & (22) \end{matrix}

The last equation comes out as −φ_ϕϵΦ in case of φ=−φ′ and φ_ϕϵΦ. Furthermore, because F is κ-Lipschitz function, the following equation may be obtained.

\begin{matrix}  φ_{ϕ} (x) - φ_{ϕ} (x^{'})  =  ψ (F_{ϕ} (x)) - ψ (F_{ϕ} (x^{'}))  \\ \overset{(a)}{\leq} \frac{1}{β}  F_{ϕ} (x) - F_{ϕ} (x^{'})  \\ \overset{(b)}{\leq} \frac{κ}{β}  x - x^{'}  =  x - x^{'}  \end{matrix}

Where inequality (a) is obtained in 1/β-Lipschitz condition of ψ, and (b) F_Φ is obtained from a κ-Lipchitz function, where κ=β is obtained by the assumption. Therefore, the φ_ϕ is a 1-Lipschitz function. Therefore, the upper bound may be obtained when extending the function space from Φ to all 1-Lipschitz functions.

\begin{matrix} [Equation 23] &  \\ \max_{φ_{ϕ} \in Φ} \int_{𝓍} φ_{ϕ} (x) d μ (x) - \int_{𝓎} φ_{ϕ} (G_{θ} (y)) d ν (y) \leq \max_{{φϵ L}^{1} (X)} \int_{𝓍} φ (x) d μ (x) - \int_{𝓎} φ (G_{θ} (y)) d ν (y) & (23) \end{matrix}

Next, it will be shown that the upper bound of Equation 23 is tight. It is assumed that the φ* is the maximum for Equation 23. The existence of such y must be shown to show that the boundary is robust.
φ*(x)=ψ(F _ϕ(x)),∀xϵX

Due to the reversibility condition of Equation 6, it is always possible to find yϵY where x=G_θ(y) for all xϵX.
ψ(y)=ψ(F _ϕ(G _θ(y)))=φ*(G _θ(y)),

Therefore, the upper bound is achieved. So the following may be obtained.

\max_{ψ \in L^{\frac{1}{β}} (y)} \int_{𝓎} ψ (y) d ν (y) - \int_{𝓍} ψ (F_{ϕ} (x)) d μ (x) = \max_{{φϵ L}^{1} (X)} \int_{𝓍} φ (x) d μ (x) - \int_{𝓎} φ (G_{θ} (y)) d ν (y)

Where l_GAN(G,F;ψ,φ)=2l_GAN(G;φ). This completes the proof.

Compared to normalizing flow (NF), the cycle-free CycleGAN offers several advantages. First, because the latent space Z of the normalized flow (NF) is generally assumed to be a Gaussian distribution, the main focus is on image generation from the noise in the latent space Z to the surrounding space X. In order to apply the normalizing flow (NF) to the image conversion between Y-X domains, it is required to implement the normalization flow (NF) network for converting from Y to Z and the normalization flow (NF) network for converting from Z to X. During the image conversion through the latent space, an empirical result according to an embodiment shows that there is information loss due to the limitation on the Gaussian latent variable. Meanwhile, in the cycle-free CycleGAN, spaces X and Y may be empirical distributions.

FIG. 6 is a diagram illustrating an interpretation of learning of cycleGAN learning technology using an optimal transmission-based invertible generator according to an embodiment.

Referring to FIG. 6 , when only the G generator and the discriminator transmitted from the Y distribution to the X distribution by using the invertible generator are trained, the transmitting function from the X distribution to the Y distribution is performed by reverse of the G generator. The most important structure for the invertible generator to learn the image distribution on the invertible operation is a coupling layer. This will be described in more detail below.

In addition, it is very interesting to geometrically interpret the method according to an embodiment. By replacing the forward operator F_Φ with the inverse of the invertible generator G_θ ⁻¹, the simultaneous statistical distance minimization problem in the original CycleGAN of FIG. 3 may be replaced with a single statistical distance minimization problem as shown in FIG. 4 . Because this phenomenon is similar to the original normalizing flow (NF), it may be assumed that the framework according to an embodiment is an extension of the normalizing flow (NF) in terms of optimal transmission.

Invertible Generator

Various architectures have been proposed to construct an invertible neural network for a flow-based generation model. For example, nonlinear independent component estimation (NICE) (Non-Patent Document 2) is based on a combinable coupling layer leading to volume-preserving invertible mapping. Thereafter, this method is further extended to the affine combining layer to increase the expressiveness of a model.

However, this architecture imposes some constraints on the function the network may represent. For example, only volume preservation mapping may be shown. Subsequent work addressed these limitations by introducing a new invertible transformation. More specifically, a combining layer using real-value non-volume preservation (Real NVP) transformation has been proposed. On the other hand, Kingma et al. (Non-Patent Document 5) proposes an invertible 1×1 transformation as a generalization of permutation operations, thereby greatly improving the image generation quality of a flow-based generation model.

Hereinafter, specific components of an invertible block used in a method according to an embodiment will be described.

FIG. 7 is a diagram illustrating the architecture of an invertible block and an invertible generator according to an embodiment.

Referring to FIG. 7 , the architecture includes L iterations of squeeze/unsqueeze blocks interleaved with an invertible 1×1 convolution and a stable addition combinable coupling layer. In this case, due to reversibility, the operation of obtaining input x from output y may be reversely performed. The detailed description is as follows.

FIG. 8 is a diagram illustrating a squeeze operation and an unsqueeze operation according to an embodiment.

1) Squeeze and unsqueeze operation: The squeeze operation S may divide the input image x into four sub-images arranged in the channel direction as shown in FIG. 8 , and may be mathematically expressed as follows.
x _1:1 :=[x ₁ ,x ₂ ,x ₃ ,x ₄ ]=S(x)

The squeeze operation is essential to build the coupling layer, which will soon become apparent. The unsqueeze operation U is expressed as follows.
x=U(x _1:4),

Then, the separated channels are rearranged into one image through the inverse of the squeeze operation. This operation is applied using the output of the coupling layer so that the unsqueeze output maintains the same spatial dimensions of the input image x.

2) Invertible 1×1 Convolution: The squeeze operation divides the input into four components according to the channel dimensions. As a result, only spatial information limited to a fixed channel arrangement passes through the neural network. Accordingly, it has been proposed to randomly mix and invert the channel dimensions (Non-Patent Document 2) and orders. Meanwhile, the generation flow using the invertible 1×1 convolution (Glow) proposed the invertible 1×1 convolution with the same number of input and output channels as a generalization of permutation operation using learnable parameters (Non-Patent Document 5).

Mathematically, the 1×1 convolution C may be expressed by multiplying the matrix WϵR4×4 as follows.
[Equation 24]
C(x _1:4)=x _1:4 W (24)

This is shown in FIG. 9 . FIG. 9 is a diagram illustrating an invertible 1×1 convolution and vice versa according to an embodiment.

By multiplying the fully filled matrix, it is possible to more efficiently apply subsequent operations by mixing input information classified for each channel together. Then, when ‘W’ is invertible, the inverse operation C⁻¹may be expressed as follows (Non-Patent Document 5).
[Equation 25]
C ⁻¹(y _1:4)=y _1:4 W ⁻¹ (25)

3) Stable Bondable Coupling Layer: The coupling layer is an essential component that provides the expressiveness of a neural network while providing reversibility. The combinable coupling layer of NICE (Non-Patent Document 2) is based on even decomposition and odd decomposition of a sequence, and then, neural networks are applied alternately.

The input image is divided into 4 channel blocks and further extended with a general coupling layer to which a neural network is applied at every step. The separated inputs may be processed more efficiently by applying a general invertible transform.

In particular, the stable coupling layer is given by the following equation.
[Equation 26]
y ₁ =x ₁ +F ₁([x ₂ ,x ₃ ,x ₄])
y ₂ =x ₂ +F ₂([y ₁ ,x ₃ ,x ₄])
y ₃ =x ₃ +F ₃([y ₁ ,y ₂ ,x ₄])
y ₄ =x ₄ +F ₄([y ₁ ,y ₂ ,y ₃]) (26)

Where F_i(⋅), i=1, . . . , 4 is a neural network. Then, the block inversion may be easily performed.
[Equation 27]
x ₄ =y ₄ −F ₄([y ₁ ,y ₂ ,y ₃])
x ₃ =y ₃ −F ₃([y ₁ ,y ₂ ,x ₄])
z ₂ =y ₂ −F ₂([y ₁ ,x ₃ ,x ₄])
x ₁ =y ₁ −F ₁([x ₂ ,x ₃ ,x ₄]) (27)

For example, the addition operation y₁=x₁+F₁([x₂, x₃, x₄]) and the inverse operation z₁=y₁−F₁([x₂, x₃, x₄]) are the same as in FIGS. 10 and 11 .

FIG. 10 is a diagram illustrating a method of forward calculation of a coupling layer handling an image according to an embodiment. FIG. 11 is a diagram illustrating a method of inverse calculation of a coupling layer handling an image according to an embodiment.

Referring to FIG. 10 , a method of a single addition operation, that is, a forward operation of a coupling layer handling an image according to an embodiment is illustrated. First, the image is divided into four divided independent images. By the forward operation of the coupling layer, the three divided images pass through the neural network and the results are added to the remaining one image. In this case, the three divided images are maintained as they are.

Referring to FIG. 11 , a method of inverse operation of a single addition operation, that is, an inverse operation of a coupling layer handling an image, according to an embodiment is illustrated. This represents a coupling layer that performs backward operation of the forward operation result of FIG. 10 again. The inverse operation of the coupling layer takes the same images from the positions of the three divided images, passes the images through the same neural network, and subtracts the result from the remaining one image. Because the same result is obtained from the same image and the same neural network, the process of adding and subtracting the same result occurs in the end, and by using the coupling layer with these characteristics, it is possible to provide the invertible generator to learn the image distribution on the invertible operation.

4) Lipschitz Constant Calculation: It may be easily understood that Jacobian of the stable coupling layer has a unit determining factor (Non-Patent Document 2). In fact, among the modules of the invertible network described above, only the module without a unit determinant is a 1×1 convolutional layer. In particular, the log determinant of operation of Equation 24 is determined by the log determinant of W (Non-Patent Document 5).

\begin{matrix} [Equation 28] &  \\ \log ❘ \det (\frac{d 𝒞 (x_{1 : 4})}{{dx}_{1 : 4}}) ❘ = \log ❘ \det (W) ❘ & (28) \end{matrix}

Similarly, the Lipschitz constant of the invertible generator can be easily confirmed by the matrix norm of W.

Wavelet Residual Learning

Unlike image generation due to noise, one of the important observations in image denoising is that low-noise and clean images share structural similarity. Therefore, in Non-Patent Document 1, a wavelet residual domain learning approach was proposed instead of learning all components of an image, and the embodiments follow the same procedure.

FIG. 12 is a diagram illustrating a method of generating a wavelet residual image according to an embodiment. FIG. 13 is a diagram illustrating a network trained using a wavelet residual image according to an embodiment.

As shown in FIG. 12 , wavelet decomposition separates a high-frequency component and a low-frequency component, and then invalidates only the low-frequency (LL) component in the last level decomposition to obtain a wavelet residual image including the high-frequency component. That is, the wavelet residual image is generated by invalidating the lowest band of the wavelet decomposition.

Then, as shown in FIG. 13 , the network according to an embodiment is trained using only high-frequency components. Accordingly, this makes it much easier to handle the CT noise component in the network because most of the CT noise is concentrated in the high frequencies and typical low-pass images are not processed by the neural network. A network according to an embodiment is trained using wavelet residuals, and after obtaining a noise pattern, a final image may be obtained by subtracting the noise pattern.

Hereinafter, an implementation method according to an embodiment will be described.

To verify the noise removal performance of the framework, two datasets of quantitative analysis and qualitative analysis are used. Paired low-dose and standard-dose CT image datasets are used for quantitative analysis. In particular, the data are abdominal CT projection data of the AAPM 2016 low-dose CT grand challenge. For qualitative experiments, an unpaired 20% dose cardiac polyphase CT scan dataset is used. The details will be described below.

1) AAPM CT Data Set: The AAPM CT dataset is a CT image dataset reconstructed from abdominal CT projection data of the AAPM 2016 low-dose CT grand challenge. With the approval of the institutional review committee of the Mayo clinic, a total of 10 patient data were obtained. A 512×512 CT image was reconstructed using a conventional filtered posterior projection algorithm. Poisson noise was inserted into the projection data to create a degree of noise corresponding to 25% of the normal dose. Because the low-dose CT image data is simulated based on the normal-dose CT image, the low-dose CT image data is a paired dataset. In case of such learning, all values of the dataset are converted in Hounsfield units [HU] and a value less than −1000 HU are truncated to −1000 HU. Then, the dataset is divided by 4000 to normalize all data values between [−1,1]. The 3839 CT images are used to learn the network, and the network is tested by using the remaining 421 images.

2) 20% Dose Polyphase Cardiac CT Scan Dataset: A 20% dose multiphase cardiac CT scan dataset was obtained from 50 CT scans of patients with mitral value deviation and 50 CT scans of patients with coronary artery disease. The dataset was collected at Ulsan University College of Medicine and used for research by Gu (Non-Patent Document 1). Electrocardiogram (ECG) gated cardiac CT scans using a second-generation dual-source CT scanner were performed. In the case of low-dose CT scans, the tube current is reduced to 20% of that of normal-dose CT scans. In the case of learning, all values of the dataset are converted to Hounsfield units [HU] and values less than −1024 HU are truncated to −1024 HU. Then, the dataset is divided by 4096 to normalize all data values between [−1,1]. The 4684 CT images are used to learn the network, and use the remaining 772 images are used to test the model.

As shown in FIG. 7 , the invertible generator is constructed based on the flow of the invertible generator with L=4. To extract wavelet residuals, daub3 wavelets are used and the wavelet decomposition level is set to 6 for all datasets.

FIG. 14 is a diagram illustrating the architecture of a neural network in a coupling layer according to an embodiment. Basically, the architecture includes three convolutional layers having spectral normalization followed by multi-channel input single-channel channel output convolution. The first and last convolutional layers use a 3×3 kernel with a stride of 1, and the second convolution layer uses a 1×1 kernel with a stride of 1. In addition, the potential feature map channel size is 256. At each step, zero padding is applied to the first and last convolutional layers so that the height and width of the feature map are the same as the previous feature map.

FIG. 15 is a diagram illustrating the architecture of a PatchGAN discriminator according to an embodiment.

The discriminator is formed based on the PatchGAN architecture. The overall structure of the discriminator is shown in FIG. 15 , which is based on the PatchGAN discriminator including 4 discriminant layers rather than 5 discriminant layers. The first two convolutional layers use a stride of 2, and the remaining convolutional layers use a stride of 1. No batch normalization is applied after the first and last convolutional layers. After batch normalization, excluding the last convolutional layer, LeakyReLU with a gradient of 0.2 is applied. In the first convolutional layer without batch normalization, LeakyReLU was applied after the convolutional layer. Discriminant loss is calculated as LSGAN loss.

For all datasets, the network is β₁=0.9, β₂=0.999, ϵ=1×10⁻⁸and trained with η=10 for fy(G_θ) at 18 by utilizing the ADAM optimizer with a mini-batch size of 1. The learning rate was initialized to 1×10⁻⁴and halved after every 50,000 iterations. The network was trained 150,000 iterations on an NVIDIA GeForce RTX 2080 Ti. In addition, the code according to an embodiment was implemented with Pytorch v1.6.0 and CUDA 10.1.

For quantitative experimental analysis, a peak signal-to-noise ratio (PSNR) and a structure similarity index metric (SSIM) are used. The PSNR is defined as follows.

\begin{matrix} [Equation 29] &  \\ PSNR (x, y) = 20 \log_{10} \frac{{MAX}_{x}}{{ x - y }_{2}}, & (29) \end{matrix}

Where x is the input image, y is the target image, and MAXx is the maximum possible pixel value of the image x.

The SSIM is defined as follows.

\begin{matrix} [Equation 30] &  \\ SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x, y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})}, & (30) \end{matrix}

Where μ is the image mean, σ is the variance of the image, and c₁=(k₁L)², c₂=(k₂L)², k₁=0.01, k₂=0.03.

A method according to an embodiment was compared with an existing unsupervised LDCT noise removal network (Non-Patent Document 1). For the AAPM dataset, the network performance was compared with the existing CycleGAN based on U-net architecture. In this case, the method was also compared with the AdaIN-based switchable CycleGAN (Non-Patent Document 1). This shows cutting edge performance for LDCT noise removal. For the unpaired 20% dose CT scan dataset, the method is compared with the AdaIN-based switchable CycleGAN.

In the learning of the existing CycleGAN, the image is cut into 128×128 patches, the learning rate is initialized to 1×10⁻³, is trained for 200 epochs, and other learning settings are set to be the same as the proposed network learning. In the AdaIN-based switchable CycleGAN, the same patch size was used, the learning rate was initialized to 2×10⁻⁴which is a network trained for 200 epochs, and the other learning settings were the same as the proposed network learning. Both comparison methods used PatchGAN including 5 convolutional layers for the discriminator architecture.

FIG. 16 is a diagram illustrating the noise removal result of a low-dose computed tomography image of a conventional recurrent generative adversarial neural network and the proposed method. Referring to FIG. 16 , (i) shows a low-dose image, (ii) shows an existing result, (iii) shows a result of the proposed method, (iv) shows a normal-dose image, (v) shows the difference between (i) and (ii), and (vi) shows the difference between (i) and (iii).

FIG. 17 is a diagram illustrating a result of confirming whether an invertible generator performs an appropriate reverse operation according to an embodiment. Referring to FIG. 17 , (i) shows a normal dose image, (ii) shows an existing result, (iii) shows a reverse operation result of the proposed invertible generator, and (iv) shows a low dose image.

In order to compare the noise removal performance of a low-dose computed tomography image of an efficient cyclic generative adversarial neural network using the proposed invertible generator, conventional cyclic generative adversarial neural network learning techniques trained under the same condition shown in FIG. 16 are compared and listed. As shown in FIG. 17 , it may be confirmed that a cleaner image than the existing cyclic generative adversarial neural network learning technique is provided, and the reverse operation result of the invertible generator creates an image with noise added to the clean image again.

Referring to FIG. 18 , the left (low-dose photographing image) and right (result of noise removal by the proposed method) images for different images (i-iii) are listed and shown. Therefore, it shows the advantage of being able to remove various noises as the same effective noise removal is performed in the noise removal of other upper tomography images.

By using a total of two neural networks of a light invertible generator neural network and a discriminator neural network that produce the above-described performance, as shown in Table 1, it has the advantage of using only 1/10 of the number of learnable parameters used for noise removal of conventional low-dose computed tomography images based on a cyclic generative adversarial neural network. Through such an efficient operation, the power used for the operation may be dramatically reduced.

Table 1 shows the number of learnable parameters used in the conventional cyclic generative adversarial neural network (left), the latest cyclic generative adversarial neural network-based noise removal technique (middle), and the proposed noise removal technique (right).

TABLE 1

Conventional CycleGAN	AdaIN CycleGAN [6]	Proposed

Network	# of Parameters	Network	# of Parameters	Network	# of Parameters

G_θ	6,251,392	G_θ	5,900,865	G_θ	1,204,320
F_ϕ	6,251,392	F	274,560	—	—
D_x	2,766,209	D_x	2,766,209	D_x	662,401
D_y	2,766,209	D_y	2,766,209	—	—
Total	18,035,202	Total	11,707,843	Total	1,866,721

As described above, according to embodiments, improved image quality is provided compared to the existing neural network-based low-dose X-ray computed tomography image restoration technique. In addition, by dramatically reducing the amount of computation required for learning the neural network, the size of the neural network is reduced to a level that is easy to store and manage even on mobile media including smartphones, so that various applications are possible.

The embodiments may be applied to low-dose X-ray computed tomography image restoration with reduced radiation dose, and may be applied to various computed tomography techniques. In addition, according to embodiments, it is possible to apply to various CycleGAN-based technologies by opening the possibility of omitting the neural network by applying an invertible generator even in a technology using the existing CycleGAN.

The foregoing devices may be realized by hardware elements, software elements and/or combinations thereof. For example, the devices and components illustrated in the exemplary embodiments of the inventive concept may be implemented in one or more general-use computers or special-purpose computers, such as a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), a programmable logic unit (PLU), a microprocessor or any device which may execute instructions and respond. A processing unit may implement an operating system (OS) or one or software applications running on the OS. Further, the processing unit may access, store, manipulate, process and generate data in response to execution of software. It will be understood by those skilled in the art that although a single processing unit may be illustrated for convenience of understanding, the processing unit may include a plurality of processing elements and/or a plurality of types of processing elements. For example, the processing unit may include a plurality of processors or one processor and one controller. Also, the processing unit may have a different processing configuration, such as a parallel processor.

Software may include computer programs, codes, instructions or one or more combinations thereof and may configure a processing unit to operate in a desired manner or may independently or collectively control the processing unit. Software and/or data may be permanently or temporarily embodied in any type of machine, components, physical equipment, virtual equipment, computer storage media or units or transmitted signal waves so as to be interpreted by the processing unit or to provide instructions or data to the processing unit. Software may be dispersed throughout computer systems connected via networks and may be stored or executed in a dispersion manner. Software and data may be recorded in one or more computer-readable storage media.

The methods according to the above-described exemplary embodiments of the inventive concept may be implemented with program instructions which may be executed through various computer means and may be recorded in computer-readable media. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded in the media may be designed and configured specially for the exemplary embodiments of the inventive concept or be known and available to those skilled in computer software. Computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as compact disc-read only memory (CD-ROM) disks and digital versatile discs (DVDs); magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. Program instructions include both machine codes, such as produced by a compiler, and higher level codes that may be executed by the computer using an interpreter.

According to embodiments, by providing the invertible generator and applying the invertible generator to CycleGAN, it is possible to learn even with unmatched data and perform unsupervised learning only with two neural networks, thereby effectively improving the image quality of a low-dose computed tomography reconstructed image and using the number of learnable parameters by 1/10 (one tenth) than that of the related art.

According to the embodiments, the size of the neural network is reduced to a level that is easy to store and manage even in removable media, including smart phones, by remarkably reducing the amount of computation required.

While a few exemplary embodiments have been shown and described with reference to the accompanying drawings, it will be apparent to those skilled in the art that various modifications and variations can be made from the foregoing descriptions. For example, adequate effects may be achieved even if the foregoing processes and methods are carried out in different order than described above, and/or the aforementioned elements, such as systems, structures, devices, or circuits, are combined or coupled in different forms and modes than as described above or be substituted or switched with other components or equivalents.

Thus, it is intended that the inventive concept covers other realizations and other embodiments of this invention provided they come within the scope of the appended claims and their equivalents.

Claims

What is claimed is:

1. A method of processing a low-dose X-ray computed tomography image based on unsupervised learning by using an invertible neural network performed by a computer device, the method comprising:

providing an invertible generator for restoring an image; and

training the invertible generator with a single discriminator to restore a low-dose computed tomography image to a normal computed tomography image.

2. The method of claim 1, further comprising:

improving a quality of the low-dose X-ray computed tomography image based on the unsupervised learning by using a single generator neural network and a single separator neural network by providing the invertible generator.

3. The method of claim 1, wherein the providing of the invertible generator includes:

providing an invertible block including a coupling layer.

4. The method of claim 3, further comprising:

allowing the invertible generator to learn a distribution of an image in an invertible operation through the coupling layer.

5. The method of claim 1, wherein the providing of the invertible generator includes:

performing a squeeze operation and an unsqueeze operation on an input image; and

performing an invertible operation through an invertible block between the squeeze operation and the unsqueeze operation.

6. The method of claim 5, wherein the invertible block includes a coupling layer coupled to an invertible 1×1 convolution stably and additionally.

7. The method of claim 1, wherein the training of the invertible generator includes:

simultaneously performing, by reverse of the invertible generator, a function of reversely returning the normal computed tomography image to the low-dose computed tomography image when the invertible generator is trained to restore the low-dose computed tomography image to the normal computed tomography image.

8. The method of claim 1, wherein the training of the invertible generator includes:

allowing the invertible generator to learn using a wavelet residual image and obtain a final image by excluding a noise pattern after obtaining the noise pattern.

9. An apparatus for processing a low-dose X-ray computed tomography image based on unsupervised learning by using an invertible neural network, the apparatus comprising:

an invertible generator providing unit configured to provide an invertible generator for restoring an image; and

a learning device configured to train the invertible generator with a single discriminator to restore from a low-dose computed tomography image to a normal computed tomography image.

10. The apparatus of claim 9, wherein a quality of the low-dose X-ray computed tomography image based on the unsupervised learning is improved by using a single generator neural network and a single separator neural network by providing the invertible generator.

11. The apparatus of claim 9, wherein the invertible generator providing unit provides an invertible block including a coupling layer.

12. The apparatus of claim 11, wherein the invertible generator learns a distribution of an image in an invertible operation through the coupling layer.

13. The apparatus of claim 9, wherein the invertible generator providing unit performs a squeeze operation and an unsqueeze operation on an input image; and performs an invertible operation through an invertible block between the squeeze operation and the unsqueeze operation.

14. The apparatus of claim 13, wherein the invertible block includes a coupling layer capable of being coupled to an invertible 1×1 convolution stably and additionally.

15. The apparatus of claim 9, wherein the learning device simultaneously performs a function of reversely returning the normal computed tomography image to the low-dose computed tomography image by reverse of the invertible generator when the invertible generator is trained to restore the low-dose computed tomography image to the normal computed tomography image.

16. The apparatus of claim 9, wherein the learning device allows the invertible generator to learn using a wavelet residual image and obtain a final image by excluding a noise pattern after obtaining the noise pattern.

17. An invertible neural network that is used in an apparatus for processing a low-dose X-ray computed tomography image based on unsupervised learning, the invertible neural network comprising:

an invertible block including a coupling layer for restoring an image with a single discriminator,

wherein the invertible neural network performs a squeeze operation and an unsqueeze operation on an input image and performs an invertible operation through an invertible block between the squeeze operation and the unsqueeze operation.

18. The invertible neural network of claim 17, wherein the invertible block includes the coupling layer capable of being coupled to an invertible 1×1 convolution stably and additionally.

19. The invertible neural network of claim 17, wherein invertible generator simultaneously performs, by reverse of the invertible generator, a function of reversely returning a normal computed tomography image to a low-dose computed tomography image when the invertible generator is trained to restore the low-dose computed tomography image to the normal computed tomography image.

20. The invertible neural network of claim 17, wherein the invertible generator learns using a wavelet residual image and obtain a final image by excluding a noise pattern after obtaining the noise pattern.