CN111862174A

CN111862174A - Cross-modal medical image registration method and device

Info

Publication number: CN111862174A
Application number: CN202010652606.9A
Authority: CN
Inventors: 李秀; 徐哲; 马露凡; 罗凤; 严江鹏
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2020-10-30
Anticipated expiration: 2040-07-08
Also published as: CN111862174B

Abstract

A cross-modality medical image registration method, comprising: providing a training set comprising a floating image of a first modality and a reference image of a second modality; inputting the floating image into an image conversion network, and converting the floating image into a conversion image of a second mode; inputting the floating image and the reference image into a cross-mode flow subnetwork to output a first deformation field; inputting the conversion image and the reference image into a single-mode streaming network to output a second deformation field; inputting the first deformation field and the second deformation field into a deformation field fusion network to output a final deformation field; inputting the floating image and the final deformation field into a space transformation network to obtain a floating image subjected to final deformation field distortion transformation; obtaining a first total loss function according to the transformed floating image and the reference image, and performing supervision training on the network by taking the minimized first total loss function as a target; and inputting the image to be registered into the trained network to obtain a registered image. The invention can greatly improve the effect of cross-modal medical image registration.

Description

Cross-modal medical image registration method and device

Technical Field

The invention relates to the technical field of medical image registration, in particular to a cross-modal medical image registration method and device.

Background

Medical image registration is an optimization process for aligning a floating image with a reference image based on the appearance of the medical image, with the goal of finding the best spatial transformation to align the region of interest in the input image. As a key technology of image-guided therapy, medical image registration attempts to establish anatomical correspondence between different medical images, and is applied to a plurality of clinical scenarios such as endoscopy, disease diagnosis, surgical guidance, and radiotherapy. Medical image registration is a broad research topic, and can be divided into single-mode registration and cross-mode registration according to the type of an image to be registered, and can be divided into rigid registration, affine registration and deformable registration according to the type of registration transformation.

The traditional registration method solves the optimal transformation by iteratively optimizing the image similarity index, and has low calculation efficiency. Therefore, the subsequent technology introduces a deep learning method into a medical image registration task, and utilizes a deep neural network to directly estimate the deformable transformation of the input image pair, thereby effectively balancing the registration precision and the calculation efficiency. However, obtaining a real deformation field (groudtruth) and a three-dimensional segmentation label is very challenging and costly, so the registration method gradually focuses on registration network learning under an unsupervised condition.

Existing cross-modality medical image registration techniques can be divided into two major categories: 1) modifying the loss function of the existing single-mode registration, and designing similarity measurement of cross-mode images to guide unsupervised deformable registration network learning; 2) the cross-modal image conversion based M2U (Multimodal to Unimodal) registration method is to convert cross-modal image registration into a single-modal registration task by means of an existing image conversion technology. The following are introduced separately:

(1) registration technology based on cross-modal image similarity measurement

The method directly performs different modality image registration tasks based on cross-modality image similarity loss. Due to the huge appearance difference between the cross-modal images, most of the traditional single-modal image similarity measurement is not suitable for the cross-modal registration task. Therefore, it is highly desirable to design an effective cross-modal image similarity loss for guiding the training of an unsupervised cross-modal registration network. To overcome this challenge, Mattias et al propose a modality independent domain descriptor MIND based on the concept of image self-similarity. MIND has higher robustness to the obvious difference between different modes, and can effectively depict the similarity of the cross-mode images.

A representative approach to this class of techniques is the VoxelMorph framework + MIND similarity metric. The cross-modal similarity metric MIND is used as a loss function, and is directly applied to a typical unsupervised registration frame VoxelMorph in an expanding mode to guide a network to learn a deformable mapping relation according to a multi-modal input image. The Network structure of VoxelMorph is shown in fig. 1, and VoxelMorph is an unsupervised deformable registration framework based on Convolutional Neural Network (CNN). The method comprises the steps that a deep convolution registration network cascades UNet and a space transformation network structure, floating images (M) to be registered and reference images (F) are used as input, and the floating images and the reference images are registered through a registration network g _θ(F, M) learning deformable mapping between input images and outputting a high-dimensional deformation field phi. The transformed floating image Warped (phi) is obtained by spatially warping the floating image M according to the estimated deformation field phi. The loss function of the whole network comprises two parts: 1) loss of similarity between the transformed floating image Warped (phi) and the reference image F; 2) the regularization loss of the deformation field phi is smoothly estimated. The cross-modal image registration technology based on VoxelMorph + MIND inputs the cross-modal image to be registered into a VoxelMorph network, uses MIND to calculate the cross-modal image similarity loss for supervising parameter training, and realizes deformable registration of the three-dimensional cross-modal image.

(2) M2U registration technology based on cross-modal image conversion

The cross-modal medical image registration technology is completed by means of an image conversion method, and the core technical idea is to convert the complex cross-modal medical image registration into a simpler single-modal registration task. The overall flow of the cross-modal registration method based on image conversion is as follows:

1) a cross-modal image conversion network is constructed aiming at cross-modal medical image data, and the aim is to learn the mapping relation between different modal images under the condition of no pairing data. A generation countermeasure network (GAN) represented by Cycle-GAN is a typical image conversion network. The cross-mode image conversion process based on the Cycle-GAN network is shown in FIGS. 2a to 2 c. To achieve the mutual mapping of the image between the two image domains X and Y, the Cycle-GAN network contains two domain mapping networks (i.e. generators) and two associated discriminators, as shown in fig. 2 a. The generator G is responsible for mapping images from image domain X to image domain Y, i.e. G: x → Y; the generator F is responsible for mapping the image from image domain Y to image domain X, i.e. F: y → X. The discriminator Dx is used to distinguish between the real image from the image field X and the image transformed by the generator F; the same principle Dy is used to distinguish between real images from the image domain Y and the images converted by the generator G. Fig. 2b shows a process of mapping an image from an original domain X to a target domain Y using a generator G and then mapping the image back to the original domain X using a generator F, in which a discriminator Dy is used to distinguish between a real image and a generated image on the image domain Y and to calculate a countermeasure loss; fig. 2c shows the process of mapping an image from the original domain Y to the target domain X using the generator F and back to the original domain Y using the generator G, using the discriminator Dx to distinguish between real images and generate images on the image domain X and to calculate the contrast loss. To ensure that the domain mapping transformations G and F are bi-directional reciprocal, the Cycle-GAN adds a Cycle-consistency loss (Cycle-consistency loss) on the basis of the arbiter countermeasures loss. The Cycle-GAN uses paired generation sub-networks to estimate mapping between images, judges whether the sub-networks judge whether the generated images are true or false, and adopts antagonism loss and Cycle consistency loss to jointly supervise network training;

2) Based on the mapping relation learned by the Cycle-GAN network, the conversion of the image from one mode to another mode is completed, the input cross-mode image is converted into a single-mode image, and the problem is simplified into single-mode image registration;

3) and constructing an unsupervised Deformable registration frame based on learning aiming at the obtained single-mode images, learning Deformable mapping among the single-mode input images by utilizing a depth convolution registration network, and performing spatial distortion transformation on the floating image by virtue of an STN module according to an estimated deformation field (Deformable Fields, DFs) so as to enable the similarity between the transformed floating image and the reference image to reach the maximum value.

However, the foregoing VoxelMorph + MIND-based cross-mode image registration technique and M2U registration technique generally cannot achieve satisfactory registration effect.

The above background disclosure is only for the purpose of assisting understanding of the concept and technical solution of the present invention and does not necessarily belong to the prior art of the present patent application, and should not be used for evaluating the novelty and inventive step of the present application in the case that there is no clear evidence that the above content is disclosed at the filing date of the present patent application.

Disclosure of Invention

In order to solve the technical problems, the invention provides a cross-modal medical image registration method and device, which can greatly improve the accuracy and robustness of cross-modal medical image registration.

In order to achieve the purpose, the invention adopts the following technical scheme:

one embodiment of the invention discloses a cross-modal medical image registration method, which comprises the following steps:

s1: providing a training set comprising a floating image of a first modality and a reference image of a second modality;

s2: inputting the floating image into an image conversion network to convert the floating image from a first modality to a second modality and output a converted image of the second modality;

s3: inputting the floating image and the reference image into a cross-mode flow sub-network, and outputting a first deformation field;

s4: inputting the conversion image and the reference image into a single-mode streaming network, and outputting a second deformation field;

s5: inputting the first deformation field and the second deformation field into a deformation field fusion network to superpose the first deformation field and the second deformation field and output a final deformation field;

s6: inputting the floating image and the final deformation field into a space transformation network to obtain a floating image subjected to distortion transformation by the final deformation field;

S7: comparing the transformed floating image with the reference image to obtain a first total loss function, and repeatedly executing steps S2-S7 to train the cross-modal streaming subnetwork, the single-modal streaming subnetwork, the deformation field fusion network and the space transformation network with the aim of minimizing the first total loss function until the training is finished and executing step S8;

s8: and (5) performing the steps S2-S6 again on the floating image in the first modality and the reference image in the second modality to be registered to obtain a transformed floating image, namely the registered image.

Preferably, the image conversion network employs a modified Cycle-GAN network, and the second total loss function of the modified Cycle-GAN network is as follows:

wherein the loss is resisted

Two discriminators D1, D of a modified Cycle-GAN network, respectively₂Antagonistic loss of (1), cycle consistency loss L_cycIs two generators G to an improved Cycle-GAN network₁、G₂Constraint of transformation reversibility, loss of identity mapping L_identityIs a normative constraint performed by generating a conversion image under the same modality, and the structural consistency is lost L_MINDIs a constraint on the structural similarity of the original image and the generated image, λ_cyc、λ_identity、λ_MINDRespectively represent L _cyc、L_identity、L_MINDThe relative importance of;

step S7 also includes training the improved Cycle-GAN network with the second total loss function as a target.

Preferably, wherein the structures are uniformSexual loss L_MINDComprises the following steps:

wherein M represents the modality independent Domain descriptor MIND, I_r1Floating image representing a first modality, I_r2Reference image representing a second modality, N₁ and N₂Respectively representing images I_r1 and I_r2R represents a non-local region around voxel x; image G₁(I_r2) Finger use generator G₁Image I_r2The resulting image obtained after conversion, image G₂(I_r1) Finger use generator G₂Image I_r1And (4) converting to obtain a generated image.

Preferably, wherein identity mapping penalizes L_identityComprises the following steps:

L_identity＝||G₁(I₁)-I₁||₁+||G₂(I₂)-I₂||₁

wherein ,I₁Image representing a first modality, I₂Image representing a second modality, image G₁(I₁) Finger use generator G₁Image I of a first modality₁The resulting image obtained after conversion, image G₂(I₂) Finger use generator G₂Image I of the second modality₂And (4) converting to obtain a generated image.

Preferably, the cross-modal streaming sub-network adopts a UNet network structure, wherein the UNet network structure comprises an encoder and a decoder, and a hopping connection is adopted between convolution layers of the encoder and the decoder.

Preferably, the monomodal streaming network adopts a UNet network structure, wherein the UNet network structure comprises an encoder and a decoder, and a hopping connection is adopted between convolutional layers of the encoder and the decoder.

Preferably, the deformation field fusion network is a 3D convolutional neural network.

Preferably, the spatial transform network comprises a spatial grid generator and a sampler, the spatial grid generator generates a sampling grid according to the final deformation field, and the sampler spatially distorts the floating image according to the sampling grid.

Preferably, the first total loss function is:

wherein ,

denotes the final deformation field, I_r1Floating image representing a first modality, I_r2A reference image representing a second modality,

representing floating images after final warping transformation, image similarity loss

Representing transformed floating images

And image I_r2The image similarity between the images is lost,

representing the final deformation field

Smoothness is subject to constrained regularization loss, and λ is the regularization coefficient.

Another embodiment of the invention discloses a cross-modality medical image registration apparatus, which includes a processor and a readable storage medium, the readable storage medium storing executable instructions executable by the processor, the processor being configured to be caused by the executable instructions to implement the cross-modality medical image registration method described above.

Compared with the prior art, the invention has the beneficial effects that: according to the cross-modal medical image registration method and device, a floating image is firstly converted into a modal through an image conversion network, then a deformation field is estimated by respectively utilizing a cross-modal flow subnetwork and a single-modal flow subnetwork under an unsupervised condition, a double-flow mechanism effectively integrates an original image and generated image information, wherein the original image texture characteristics are introduced through cross-modal flow, the interference of non-real artificial characteristics in the generated image on registration is weakened, and the single-modal flow is used for effectively inhibiting a voxel drift effect caused by the cross-modal flow; the original cross-modal flow and the synthetic single-modal flow are cooperatively optimized, and a more real deformation field is learned based on the original floating image and the generated conversion image information respectively; then, the deformation fields estimated by the cross-modal flow subnetwork and the single-modal flow subnetwork are fused, so that the accuracy and the robustness of cross-modal medical image registration are greatly improved, and better registration performance is obtained. By the cross-modal medical image registration method and device, the situation that the direct cross-modal medical image registration effect is poor due to the fact that the different modal medical images have large appearance differences can be avoided.

In a further scheme, the improved Cycle-GAN Network is further adopted as an image conversion Network, two loss function constraints are added compared with the existing Cycle-GAN Network, the structural similarity between a generated image and an original image is enhanced, the structural fidelity is improved, and more artificial features are prevented from being introduced during image conversion, so that the problems that the existing countermeasure Network (GAN) is utilized to convert a trans-modal image pair to be registered into a single-modal image pair, unreal artificial anatomical features are inevitably introduced to interfere the registration process, and the detail texture of the original image is lost, so that the registration accuracy is reduced are solved.

Drawings

Fig. 1 is a schematic diagram of a conventional VoxelMorph network structure;

FIGS. 2a to 2c are schematic diagrams of image transformation based on a Cycle-GAN network;

FIG. 3a is an original CT modality image;

FIG. 3b is an MR image generated using a Cycle-GAN network;

FIG. 4 is a flowchart illustrating a cross-modality medical image registration method according to a preferred embodiment of the present invention;

FIG. 5 is a dual-flow cross-modal registration network structure based on counterlearning in an embodiment of the present invention;

FIGS. 6a and 6b are schematic diagrams of a modified Cycle-GAN network in accordance with an embodiment of the present invention;

FIG. 7a is an original CT modality image;

FIG. 7b is an MR image generated using the improved Cycle-GAN network of the present invention;

fig. 8 is a schematic structural diagram of the cross-modal stream subnet UNet _ o/single-modal stream subnet UNet _ s;

fig. 9 is a hardware configuration diagram of a cross-modality medical image registration apparatus according to a preferred embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The terms "first", "second", and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present invention, "a plurality" means two or more unless specifically limited otherwise.

The inventor finds that the reasons that a cross-mode image registration technology based on VoxelMorph + MIND in the prior art cannot achieve a satisfactory registration effect are that the appearance difference of medical images in different modes is obvious, the registration difficulty is high by directly neglecting the mode difference of the images, and the precision cannot be guaranteed; even if the technology modifies the loss function in the previous single-mode registration task, it is still difficult to find a robust cross-mode similarity metric. In addition, the prior art M2U registration technique based on cross-modal image transformation has the defect that artificial features are inevitably introduced in generating images; most of the technologies utilize Cycle-GAN and the like to generate a countermeasure network (GAN) to realize cross-mode image conversion. Specifically, the Cycle-GAN network learns the mapping relationship between different domains in the input image using a generator, and evaluates the authenticity of the generated image in the target domain using a discriminator. The network adopts the countermeasure loss guidance generator fed back by the discriminator to learn the image domain mapping, so that the generated image is similar to the target domain image in distribution, and meanwhile, the difference between the generated image and the original domain image is continuously increased. However, such a method does not guarantee that the network outputs a desired generated image because the network learns only the distribution relationship in probability by fighting against the loss. Although the network introduces the cycle consistency loss to constrain the reversibility of different modality conversion, the network is not enough to ensure that the generated image still retains the original image characteristics, because the structural similarity of the image before and after the modality conversion is not constrained during network training. Therefore, the image conversion method based on the generation countermeasure network inevitably introduces unreal artificial features when generating the target domain image, which results in an increase of the mismatching rate in the subsequent registration process. The original Cycle-GAN network does not perform the normative constraint on the unit mapping of the input image in the same modality, and is likely to wrongly convert the input image in the target domain to another domain. Therefore, the method of simply converting cross-modal registration to single-modal registration by means of existing cross-modal image conversion techniques is not robust.

Meanwhile, the target modality image generated by the existing image conversion technology is easy to lose the local texture features of the original image, and presents a larger structural difference from the original image, taking abdominal registration from CT to MR as an example, as shown in fig. 3a and 3b, fig. 3a is the original CT modality image, fig. 3b is the MR image generated by using a Cycle-GAN network, wherein fig. 3b is compared with fig. 3a and loses the local texture features in the image. In the unsupervised registration stage, the existing cross-modality registration technology generally inputs the generated images directly into a deformable registration network as floating images for single-modality registration without including the original images. Since the entire unsupervised registration network estimates the deformable transformation from the input image appearance features, the fidelity of the Deformation Field (DFs) estimated by the registration network naturally depends on the consistency of the input image with the original image features. Because the prior art ignores auxiliary information provided by an original image during registration, parameter learning of a registration network is greatly influenced by artificial features in a generated image, so that the network finally estimates a distorted deformation field, the original floating image cannot be well aligned to a reference image, and the registration accuracy is reduced.

In addition, unsupervised medical image registration utilizes similarity or error loss supervised network training between the transformed floating image and the reference image. Because the appearance difference between different modal images is huge, the similarity measurement index commonly used in the unsupervised single-modal registration task is not suitable for the cross-modal scene any more. Many learning-based unsupervised registration techniques use Mutual Information (MI), Cross Correlation (CC), etc. indicators to measure image similarity across modal registration tasks and use it to guide network parameter learning. However, the similarity measurement indexes are directly migrated and applied to the cross-modal registration task, and the image similarity cannot be effectively described, so that the network learns in the wrong direction according to biased image similarity guide parameters.

The invention aims to incorporate original image information in the registration process through a design mode of double-flow registration field fusion, and helps a network to robustly learn a more real deformation field, thereby obtaining better registration performance. In order to guide the cross-modal registration process by the original image information, the invention effectively utilizes the original cross-modal flow and the estimated deformation field of the synthetic single-modal flow, and automatically learns how to better fuse the two deformation fields through a convolution network. Meanwhile, in order to avoid introducing unreal artificial features in the image conversion process, two loss function constraints are added in the Cycle-GAN to improve the fidelity of the anatomical structure in the generated image.

As shown in fig. 4, a preferred embodiment of the present invention provides a cross-modality medical image registration method for registering a floating image of a first modality to a reference image of a second modality, including the following steps:

s2: inputting the floating image into an image conversion network to convert the image to be registered from a first modality into a second modality and outputting a conversion image of the second modality;

s6: inputting the floating image and the final deformation field into a space transformation network to obtain a floating image subjected to final deformation field distortion transformation;

s7: comparing the transformed floating image with the reference image to obtain a first total loss function, and repeatedly executing steps S2-S7 to perform supervised training on the cross-modal streaming subnetwork, the single-modal streaming subnetwork, the deformation field fusion network and the space transformation network with the aim of minimizing the first total loss function until the training is completed and executing step S8;

The complete process of the cross-modal medical image registration method provided by the invention is shown in fig. 5 and can be divided into two parts, namely a cross-modal image conversion network based on improved Cycle-GAN and a double-flow cross-modal image registration network. The following description will be given by taking the example of registration from a floating CT image to a reference MR image, but the method of the present invention is not limited to CT-MR cross-modality medical image registration, but can be applied to other cross-modality image registrations as well, such as magnetic resonance-ultrasound (MR-US) registration, computed tomography-ultrasound (CT-US) registration, and the like.

In combination with the dual-flow registration fusion cross-modal medical image registration network structure based on counterstudy in fig. 5, the overall registration step includes:

a1: providing a training set comprising a floating image rCT of the CT modality and a reference image rMR of the MR modality;

a2: inputting the floating image rCT of the original CT modality into an improved Cycle-GAN image conversion network, realizing the conversion of the floating image from the CT modality to the MR modality, and outputting the network to generate an image tMR;

As an optimal model for image transformation, the Cycle-GAN network can be trained without requiring CT and MR paired data for the same patient. The image conversion Cycle-GAN network model used in the present invention is shown in fig. 6a and 6 b. A schematic diagram of forward conversion (CT-to-MR) and backward conversion (MR-to-CT) of CT modality and MR modality images is depicted in fig. 6a, wherein the solid lines in the diagram represent the forward conversion (from rCT-1 to tMR-1) and backward conversion (from tMR-1 to tCT-1) processes of the original CT modality image; the dashed lines represent the forward (from rMR-2 to tCT-2) and reverse (from tCT-2 to tMR-2) conversion processes for the real MR modality image. The Cycle-GAN image conversion network is composed of two generators G_MR、G_CTAnd two discriminators D_CT、D_MRAnd (4) forming. Wherein, the generator G_MRFor converting images from CT modality to MR modality (CT-to-MR), as shown by the generator G_MRThe generated image tMR-1 is output with rCT-1 as an input. Generator G_CTFor converting images from an MR modality to a CT modality (MR-to-CT), such as an image generator G_CTTaking rMR-2 as input, and outputting a generated image tCT-2; discriminator D_CTFor distinguishing real CT modality images from warp generator G_CTThe resulting transformed generated image distinguishes between the true CT image rCT-3 and the generated image tCT-2. For the same reason, the discriminator D _MRFor distinguishing real MR modality images from warp generator G_MRThe resulting image obtained after conversion, e.g. distinguishing between the real image rMR-3 and the resulting mapLike tMR-1. FIG. 6b is the identity mapping loss constraint of the improved Cycle-GAN network of the present invention on image transformation within the same modality.

The loss function of the improved Cycle-GAN network in the invention comprises four parts: (1) fight loss from arbiter

And

the countermeasure loss is to punish the difference between the data distribution of the generated image and the real image of the target mode, so that the image converted by the generator and the image of the target mode have highly similar data distribution and are difficult to distinguish by the discriminator; (2) loss of cyclic consistency L_cycIs for two generators G_CT and G_MRConstraint of invertibility of the transformation, i.e. by generator G_CTThe converted image is then processed by a generator G_MRThe conversion can return to the original mode image and is highly similar to the original image data distribution. For example, the original CT modality image rCT-1 is generated by the generator G_MRConverted to tMR-1, and reused as a generator G_CTConverting the image tMR-1 back to the original CT mode to obtain tCT-1, wherein the tCT-1 has the same data distribution as the original image rCT-1; (3) loss of structural consistency L _MINDThe structural similarity of the original image and the generated image is restricted, and the purpose is to ensure that the image converted by the generator and the original image retain highly consistent structural features. For example, the original image rCT-1 in the figure will be the generator G_MRThe conversion yielded tMR-1, at a loss of structural consistency L_MINDrCT-1 has a high degree of structural similarity to tMR-1; (4) identity mapping penalty L_identityAs shown in FIG. 6b, the transformed images generated in the same modality are normatively constrained, and the loss L is mapped to the identity_identityUnder the training constraints of (3), the image transformation within the same modality should remain unchanged. For example, the MR modality image rMR in FIG. 6b is generated by the generator G_MRThe tMR resulting from the conversion should be the same as the original.

Wherein the structural consistency is lost L_MINDThe MIND is used for measuring the structural similarity between an original modal image and a target modal image obtained after conversion by a generator, such as the structural similarity between rCT-1 and tMR-1, and is used for describing local structural characteristics around each pixel. L is _MINDHas higher robustness to the obvious difference between different modes, and is used for restricting the structural consistency between the generated image and the original image. L is_MINDGuiding network training and continuously reducing generated images G_CT(I_rMR) Or G_MR(I_rCT) And image I_rMROr I_rCTAnd MIND loss between the images to enhance the structural similarity between the images before and after conversion.

Structural consistency loss L used in the present invention_MINDIs defined as formula (1),

wherein M represents MIND, I_rMRReference image representing an MR modality, I_rCTFloating image representing CT modality, N_MR and N_CTRespectively representing images I_rMR and I_rCTNumber of voxels in, R represents the non-local area around voxel x, image G_CT(I_rMR) Finger use generator G_CTImage I_rMRThe resulting image obtained after conversion (also denoted tCT), image G_MR(I_rCT) Finger use generator G_MRImage I_rCTThe resulting image resulting from the conversion (also indicated by tMR).

In addition, the invention loses L through identity mapping to the Cycle-GAN network_identityNormative constraint is carried out on the converted images generated in the same modality, and the loss L of the identity mapping is_identityAs shown in formula (2):

L_identity＝||G_MR(I_MR)-I_MR||₁+||G_CT(I_CT)-I_CT||₁formula (2)

wherein ,G_MR(I_MR) Representing MR modality image I_MRWarp generator G_MRThe MR modality resulting after the conversion generates an image, G_CT(I_CT) Representing CT modality image I_CTWarp generator G_CTAnd generating an image by the converted CT mode. L is _identityThe L1 distance between the generated image and the real image, G, within the same modality is calculated_MR(I_MR) and I_MR、G_CT(I_CT) and I_CTThe sum of the L1 distances therebetween. Specifically, the loss L is mapped in identity_identityShould be kept constant, i.e. G, for image transitions within the same modality_MR(I_MR)≈I_MR，G_CT(I_CT)≈I_CT. Loss of L through identity mapping_identityIt is possible to prevent the generator from erroneously converting an image already in the target modality to another modality.

In conclusion, the total loss L of the improved Cycle-GAN network in the invention is the antagonistic loss

And

loss of cyclic consistency L_cycIdentity mapping loss L_identityStructural uniformity loss L_MINDIs defined as shown in equation (3):

wherein ,λ_cyc、λ_identity、λ_MINDRespectively represent the cyclic consistency loss L_cycIdentity mapping loss L_identityStructural uniformity loss L_MINDRelative importance of.

In the step, an improved Cycle-GAN network is adopted to carry out the original CT mode diagramImage rCT is transformed as shown in fig. 7a and 7b, wherein fig. 7a is the original CT modality image; FIG. 7b is an MR image generated using a modified Cycle-GAN network; fig. 7b preserves local texture features in the image compared to fig. 7a, and therefore, from the visualization results of fig. 7a and 7b, the structural consistency loss L is observed_MINDThe addition of (2) effectively enhances the structural similarity between the generated image tMR and the original image rCT, and improves the boundary fidelity of the organ.

The training loss of the existing Cycle-GAN network includes only two terms: loss of confrontation given by the discriminator

And

and cycle consistency loss L_cyc. Wherein, the confrontation loss and the cycle consistency loss guide the network to learn the mapping relation among the images in different modes, and the cycle consistency loss restrains the reversibility of the mapping transformation. However, the inventors found that it is difficult to train a robust cross-modal medical image transformation network by only relying on these two loss function constraints, because the cyclic consistency loss is not enough to guarantee the structural similarity between the generated image and the original image (as shown in the comparison between fig. 3a and fig. 3 b); moreover, the existing Cycle-GAN network does not perform the normative constraint on the unit mapping of the input images in the same modality, and the input images in the target domain are likely to be converted to another domain by mistake. Therefore, the Cycle-GAN network is improved in this step, and two additional loss functions are introduced: loss of structural consistency L_MINDAnd identity mapping loss L_identityA total of four losses are used to constrain the training of the Cycle-GAN network, thereby ensuring structural similarity between the generated image and the original image, as shown by the comparison of fig. 7a and 7b, and avoiding erroneous conversion of the input image already in the target domain to another domain.

A3: cross-modality streaming network with original CT modality floating image rCT and MR modality reference image rMR as inputs (i.e., the inputs are cross-modality image pairs (rCT, rMR)), through UNet structural networkingOutput deformation field by learning deformable mapping between input image pair

Wherein the deformation field

I.e. a deformable mapping relationship representing the input cross-modal image pair (rCT, rMR);

in this embodiment, a UNet network structure is adopted for the cross-modal flow subnet UNet _ o. As shown in FIG. 8, the original CT modality image (rCT) and the MR modality image (rMR) are referred to as floating images I, respectively_mAnd a reference picture I_fGrayscale float image I with channel number 1_mAnd the number of channels is 1_fIs shown by_m and I_fAnd splicing according to the channel direction to obtain three-dimensional volume images of the two channels as input images. The UNet network adopts an encoder-decoder structure, 3D convolution with the step size of 2 is adopted in an encoder part to reduce the spatial resolution of an input image, and a 3D up-sampling layer is adopted in a decoder to restore the spatial resolution of the image. Using a jump connection between convolutional layers of an encoder and a decoder to fuse shallow features and deep features; the number of channels per convolutional layer output signature is shown as the number at the top of the rectangular convolutional layer in fig. 8. Learning deformable transformation parameters among cross-modal input images through a three-dimensional depth convolution network, and outputting to obtain a 3-channel deformation field

At this step, the raw cross-modality flow incorporates the raw image rCT into a cross-modality registration framework so that the model can estimate the deformation field based on the detail texture features provided in rCT

The introduction of the raw information assists the model in learning a more realistic deformable transformation, which may reduce the interfering effects of the artificial features in the generated image tMR on the registration.

A4: single mode streaming network using the previously improved CycOutput of le-GAN network to generate image tMR and MR modality reference image rMR as input (i.e. the input is a single modality image pair (tMR, rMR)), learning a deformable mapping between the input image pair using the same UNet network as across the modal flow, outputting a deformation field

Wherein the deformation field

I.e. a deformable mapping relationship representing the input pair of monomodal images (tMR, rMR);

in the present embodiment, the single-mode streaming sub-network UNet _ s also employs the same UNet network architecture as the cross-mode streaming sub-network UNet _ o as shown in fig. 8. The only difference is that UNet _ o is a cross-modal input, while UNet _ s is a single-modal input. The network may convert the original CT modality image (rCT) to an MR modality image (tMR) by image conversion. The single-mode streaming network UNet _ s inputs the generated tMR image and the rMR image into the network as a floating image and a reference image respectively, learns the deformable mapping between the single-mode input images through a three-dimensional depth convolution network, and finally outputs a 3-channel deformation field

In the step, the synthesized single-mode flow can learn more texture information in the single-mode image, and effectively inhibits the voxel drift phenomenon caused by the cross-mode flow.

A5: deformation field fusion network deformation field estimated with the first two streams (cross-modal and single-modal)

And

for inputting, the convolution network is adopted to carry out mixed superposition on the two deformation fields, and the final deformation field is output

Wherein the cross-modal and single-modal streaming subnetworks estimate the deformation field based on the cross-modal input (rCT and rMR) and the single-modal input (tMR and rMR), respectively

And

the deformation field in this step

And

performing mixed superposition, adopting a convolution neural network with convolution kernel size of 3 multiplied by 3 to effectively fuse two deformation fields, and outputting a final deformation field

Wherein the 3D volume deformation field

Has the following advantages

And

the dimensions are all 3 channels.

A6: spatial transformation network based on the final deformation field

The original CT mode floating image rCT is spatially warped by

Representing, obtaining a transformed floating image (moved CT);

network convergence based on deformation fieldTo the final deformation field

In this step, the floating image rCT is spatially warped by means of a Spatial Transform Network (STN). In this embodiment, the STN comprises a spatial grid generator and a sampler, and the deformation field can be predicted according to the grid

A sampling grid is generated and then spatially warped rCT by the sampler.

A7: calculating the training loss of the network, comprising two parts: loss of image similarity between the transformed floating image and the reference image; smoothing of the final deformation field

Loss of regularization. And repeating the steps A2-A7 to carry out supervised training on the dual-stream cross-mode image registration network by taking a minimum loss function as a target until the training is finished and the step A8 can be carried out to directly register the output registration image by using the network.

The embodiment provides a double-current cross-modal image registration network, which comprises a cross-modal streaming subnetwork, a single-modal streaming subnetwork, a deformation field fusion network and a Space Transformation Network (STN), wherein the training of the double-current cross-modal image registration network is similar to multi-resistance training, cross-modal streams and single-modal streams are mutually independent and mutually restricted, and the whole unsupervised registration network is cooperatively optimized. From the design of the optimization objective, the loss function of the network includes two terms: loss of image similarity L_simAnd regularization loss L_smooth. Wherein the similarity loses L_simDepicting the transformed floating image

And the similarity between the reference image rMR, the structural similarity index is used for measuring the image similarity in the embodiment, and the structural similarity index is independent of the image brightness and the contrast. Wherein the content of the first and second substances,

Final deformation field representing the network estimate to be registered

Applied to the original floating image rCT, a spatial warping transform is applied rCT to obtain a transformed floating image. In addition, regularization loss L_smoothDeformation field estimated for network

Imposing smoothness constraints, L is used in this embodiment₂Normal form pair final deformation field

The gradient of (a) is regularized.

In summary, the total loss function L of the dual-flow cross-modal image registration network proposed in this embodiment_totalFor loss of image similarity L_simAnd regularization loss L_smoothIs calculated as a weighted sum of. Total loss definition L_totalAs shown in formula (4):

wherein ,L_simRepresenting transformed floating images

And MR modality reference image I_rMRLoss of inter-image similarity, L_smoothRepresenting the final deformation field

Smoothness makes a constrained regularization penalty. I is_rMRReference image representing an MR modality, I_rCTA floating image representing the original CT modality,

representing the final deformation field of the dual-flow cross-modal registration network output,

representing the field of deformation according to the final

Floating image I with distortion transformation of original CT mode_rCTAnd lambda is a regularization coefficient of the obtained transformed floating image.

By minimizing the total loss function L in this embodiment_totalSimultaneously, the method realizes the maximization of the similarity between the transformed floating image and the reference image (namely, the minimization of the image similarity loss L) _sim) And a smooth deformation field (i.e., minimizing regularization loss L)_smooth) (ii) a And to minimize the total loss function L_totalAnd carrying out supervision training on the double-flow cross-modal image registration network for the target.

In summary, the dual-flow cross-modality registration technique proposed in this embodiment allows the network to estimate the final deformation field by using the original image and the generated image information at the same time. With this robust learning framework, robust and efficient registration of cross-modality medical images can be achieved with complete unsupervised. Moreover, the dual-stream cross-modality image registration network effectively combines the original cross-modality stream and the composite single-modality stream, making full use of the information of the original floating image rCT, the reference image rMR, and the generated image tMR. Therefore, the problems of high matching rate and distortion field caused by introducing unreal features when an image is generated by the conversion-based registration technology can be solved through the double-current cross-mode image registration network.

A8: inputting the floating image rCT of the CT mode and the reference image rMR of the MR mode to be registered into the trained double-current cross-mode image registration network to obtain a transformed floating image

I.e. the registered image.

In this embodiment, given a floating image rCT of any CT modality and a reference image rMR of an MR modality, rCT converted to an MR type image tMR using a modified Cycle-GAN network, then original cross-modality flow is used and a single mode is synthesized Estimation of two deformation fields separately for the current

And

fusion through 3D convolutional networks

And

the final deformation field is obtained

The warping transformation of floating image rCT is implemented by means of a Spatial Transformation Network (STN). The goal of the entire unsupervised cross-modal registration network is to maximize the transformed floating images

And the reference image rMR.

The embodiment of the invention provides a novel dual-flow cross-modal medical image registration technology based on countermeasure learning, which realizes CT-MR cross-modal registration under the unsupervised condition, makes up the defects of the existing registration technology based on image conversion, and improves the unsupervised cross-modal medical image registration precision and robustness.

The double-current cross-modal registration network provided by the embodiment of the invention mainly comprises two parts: 1. a cross-modal image conversion network based on modified Cycle-GAN, which converts CT-modality floating images rCT into MR-modality generated images tMR using the modified Cycle-GAN network; 2. and (3) a cross-modal image registration network based on double-flow registration field fusion. The cross-modal image registration network is divided into four parts, namely a cross-modal streaming subnetwork, a single-modal streaming subnetwork, a deformation field fusion network and a Space Transformation Network (STN).

In the invention, a cross-modal image conversion model Cycle-GAN is improved, and the structure consistency loss and the identity mapping loss are added in a network loss function, so that the structural similarity between a generated image and an original image is obviously enhanced. In order to quantitatively evaluate the performance of the improved Cycle-GAN model, SSIM and peak signal-to-noise ratio PSNR indices were used as follows to measure the quality of MR images generated from CT images on two datasets (Pig Dataset-Pig Ex-vivo kinetic CT-MR Dataset, Abdomen (ABD) CT-MR Dataset). The SSIM index measures the structural similarity between the images before and after the cross-mode conversion, the PSNR index is used for evaluating the quality of the generated image compared with the original image, and the higher the two index values are, the better the quality is. As shown in table 1, the improved version of the Cycle-GAN network proposed by the present invention performs better than the existing original version of the Cycle-GAN network.

TABLE 1 Cross-modal image conversion experiment result comparison

Under the unsupervised condition, the invention provides a double-flow cross-modal registration framework, introduces the texture features of the original image through cross-modal flow, weakens the interference of the generated unreal artificial features in the image on registration, and effectively inhibits the voxel drift effect caused by the cross-modal flow by using the single-modal flow. And performing collaborative optimization on the original cross-modal flow and the synthesized single-modal flow, and estimating a more real deformation field based on the original CT image and the generated MR image information respectively. And a 3D convolution network adopting a mixing module automatically learns how to better fuse the two deformation fields so as to obtain better registration performance.

The following uses the Dice coefficient and Target Registration Error (TRE) to evaluate the performance of different cross-modal Registration models. The Dice coefficient measures the overlapping degree between the floating image and the reference image after being transformed by the STN module, and the higher the Dice coefficient value, the better. TRE is an index dedicated to measure the accuracy of the registration algorithm, and represents the position distance (in mm) of the target point set on the registration image and the reference image, and the lower the TRE index value, the better the registration performance.

The evaluation results on two clinical data sets (as shown in table 2) prove the effectiveness of the invention, and compared with the traditional cross-modal registration algorithm (VoxelMorph + MIND) and other cross-modal medical image registration technology based on deep learning (M2U), the dual-flow cross-modal medical image registration technology provided by the invention is significantly superior to the prior art in registration accuracy.

TABLE 2 Cross-modal image registration experiment result comparison

Fig. 9 is a schematic diagram of a hardware structure of a cross-modality medical image registration apparatus according to another preferred embodiment of the present invention. The imaging device may include a processor 901, a readable storage medium 902 storing executable instructions. The processor 901 and the readable storage medium 902 may communicate via a system bus 903. Also, by reading and executing executable instructions in readable storage medium 902 corresponding to imaging logic, processor 901 may perform a method of cross-modality medical image registration apparatus described above.

Readable storage media 902, as referred to herein, may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: non-volatile memory, flash memory, a storage drive (e.g., a hard drive), a solid state disk, any type of storage disk (e.g., a compact disk, a DVD, etc.), or similar storage media, or a combination thereof.

Embodiments of the subject matter and the functional operations described in this specification can be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and their structural equivalents, or a combination of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Moreover, a computer may be embedded in another device, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., an internal hard disk or a removable disk), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims

1. A cross-modality medical image registration method is characterized by comprising the following steps:

2. The cross-modality medical image registration method according to claim 1, wherein the image transformation network employs a modified Cycle-GAN network, and a second total loss function of the modified Cycle-GAN network is:

wherein the loss is resisted

Two discriminators D, each being an improved Cycle-GAN network₁、D₂Antagonistic loss of (1), cycle consistency loss L_cycIs two generators G to an improved Cycle-GAN network₁、G₂Constraint of transformation reversibility, loss of identity mapping L_identityIs a normative constraint performed by generating a conversion image under the same modality, and the structural consistency is lost L_MINDIs a constraint on the structural similarity of the original image and the generated image, λ_cyc、λ_identity、λ_MINDRespectively represent L_cyc、L_identity、L_MINDThe relative importance of;

3. The cross-modality medical image registration method of claim 2, wherein structural consistency loss L_MINDComprises the following steps:

wherein M represents the modality independent Domain descriptor MIND, I_r1Floating image representing a first modality, I_r2Reference image representing a second modality, N₁ and N₂Respectively representing images I_r1 and I_r2R represents a non-local region around voxel x; image G ₁(I_r2) Finger use generator G₁Image I_r2The resulting image obtained after conversion, image G₂(I_r1) Finger use generator G₂Image I_r1And (4) converting to obtain a generated image.

4. The cross-modality medical image registration method of claim 2, wherein identity mapping loss L_identityComprises the following steps:

L_identity＝||G₁(I₁)-I₁||₁+||G₂(I₂)-I₂||₁

5. The cross-modality medical image registration method of claim 1, wherein the cross-modality streaming sub-network employs a UNet network structure, wherein the UNet network structure comprises an encoder and a decoder, and wherein hopping connections are employed between convolutional layers of the encoder and the decoder.

6. The cross-modality medical image registration method of claim 1, wherein the single-modality streaming network employs a UNet network structure, wherein the UNet network structure comprises an encoder and a decoder, and wherein hopping connections are employed between convolutional layers of the encoder and the decoder.

7. The cross-modality medical image registration method of claim 1, wherein the deformation field fusion network is a 3D convolutional neural network.

8. The cross-modality medical image registration method of claim 1, wherein the spatial transformation network comprises a spatial grid generator that generates a sampling grid from the final deformation field and a sampler that spatially warp transforms the floating image according to the sampling grid.

9. The cross-modality medical image registration method of claim 1, wherein the first total loss function is:

wherein ,

Representing transformed floating images

And image I_r2The image similarity between the images is lost,

representing the final deformation field

10. A cross-modality medical image registration apparatus comprising a processor and a readable storage medium storing executable instructions executable by the processor, the processor being arranged such that the executable instructions cause the cross-modality medical image registration method of any one of claims 1 to 9 to be implemented.