CN112102385A

CN112102385A - Multi-modal liver magnetic resonance image registration system based on deep learning

Info

Publication number: CN112102385A
Application number: CN202010845127.9A
Authority: CN
Inventors: 于航; 王成彦; 王鹤
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2020-12-18
Anticipated expiration: 2040-08-20
Also published as: CN112102385B

Abstract

The invention belongs to the technical field of magnetic resonance image registration, and particularly relates to a multi-modal liver magnetic resonance image registration system based on deep learning. The system of the invention comprises: the image preprocessing module is used for preprocessing a source image and a target image and carrying out pre-registration by using a traditional method; the training sample screening module is used for screening out a sample for GAN network training from the sample pre-registered by the conventional method; the GAN network module is used for registering the multi-mode liver magnetic resonance image; the GAN network includes a discriminator network and a generator network; according to the invention, the GAN network is pre-trained to learn the mapping relation between image domains with different contrast ratios, so that the function of converting one contrast image into another contrast image is realized; and then, in the registration process, the trained GAN network is utilized to assist the traditional registration method, so that errors caused by modal differences in registration are eliminated, the registration time is shortened, and the registration accuracy is improved.

Description

Multi-modal liver magnetic resonance image registration system based on deep learning

Technical Field

The invention belongs to the technical field of magnetic resonance image registration, and particularly relates to a liver magnetic resonance image registration method system with different contrasts, namely different modalities.

Background

Liver cancer is a common tumor in China, is the sixth most common cancer in the world, is also the cancer with the second highest Asian fatality rate, and has important significance in early and accurate diagnosis of liver lesions. In the process of diagnosing liver lesions, comprehensive and comprehensive judgment is generally performed according to the characteristics and advantages of different images by combining image information of multiple modalities. For example, the magnetic resonance Dynamic enhanced MRI (Dynamic Contrast enhanced MRI, DCE-MRI) can carry out quantitative analysis on the permeability of tumor vessels, not only can provide the morphological characteristics of liver lesions, but also can reflect the blood supply state of the lesions; diffusion Weighted Imaging (DWI) can carry out quantitative analysis on the characteristics and the tissue structure of liver cells, provides more information than conventional MRI, and has great significance for determining pathological changes and physiological activities of lesion parts; in recent years, blood oxygen level dependent functional magnetic resonance imaging (BOLD-fMRI) with deoxyhemoglobin as an endogenous contrast agent can reflect changes of hemodynamics and blood oxygen content, and can be used for diagnosis and severity evaluation of liver diseases. The purpose of image registration is to align images of different modalities spatially or temporally to more accurately locate the lesion.

The traditional multi-modal registration method mostly adopts a linear transformation model and a nonlinear transformation model. The linear transformation model comprises rigid registration, similarity registration and affine registration through adjustment of a linear change matrix and a translation vector, but the method cannot take local differences between a source image and a target image into account. In nonlinear transformation, a deformation field-based spline function, such as a thin-plate spline method and a B-spline method, is commonly used, and an optimal smooth displacement field is obtained by solving a mapping relation between corresponding control points of a target image and a reference image, however, as the number of the control points increases, a coefficient matrix becomes larger and larger, and calculation accuracy and stability are reduced.

For multi-modality magnetic resonance image registration, the registration difficulty mainly lies in contrast difference between different modality images, and difference in anatomical structure of corresponding organs is caused due to different time points for acquiring different modality images, which both causes great difficulty in extracting feature space. Therefore, the multi-modal registration directly using the conventional registration method usually cannot achieve higher accuracy, and the speed is still to be improved.

Disclosure of Invention

The invention aims to provide a multi-modal liver magnetic resonance image registration system which can reduce the difficulty of feature extraction between multi-modal images and realize more accurate and faster registration.

According to the multi-modal liver magnetic resonance image registration system provided by the invention, the pre-trained GAN network is utilized to convert the source modal image into the target modal image, so that the difficulty of extracting the feature space is reduced, and the auxiliary traditional registration method is enabled to carry out more accurate and faster registration.

The multi-mode liver magnetic resonance image registration system provided by the invention is based on deep learning and specifically comprises the following modules.

The image preprocessing module is used for preprocessing a source image and a target image and carrying out pre-registration by using a traditional method, namely carrying out rigid registration, affine registration and nonlinear registration in sequence;

in the image preprocessing module, the preprocessing modes mainly include three types:

(1) because some images have the problem of incomplete images of the first layer and the last layer in the acquisition process, 3-10 layers of the beginning and the end of each 3D image sample are removed;

(2) re-sampling all 2D images fed into the network to a resolution of 128 x 128 to 1024 x 1024;

(3) and normalizing the 2D image according to the value range of the pixel points of the image pair and the output range of the activation function of the network structure, and normalizing the pixel value to be in the range of [0,1] or [ -1,0 ].

And the training sample screening module is used for screening the sample pre-registered by the conventional method, screening out a sample with good hierarchical correspondence and clear organ tissues as a training sample of the GAN network.

The GAN network module is trained and used for registering the multi-mode liver magnetic resonance image; the GAN network is built by adopting but not limited to a pix2pix architecture; the constructed GAN network mainly comprises a discriminator network and a generator network; wherein:

the generator network architecture, as shown in fig. 4, adopts a structure similar to U-Net, and the generator network includes: 5-12 convolutional layers, 5-15 deconvolution layers; each convolutional layer selects a suitable activation function, including but not limited to Relu function, tanh function, etc.

The identification network structure adopts a patch-GAN structure as shown in FIG. 5, that is, a feature matrix with a small size is obtained after a piece of image is subjected to multilayer convolution, probability values of each feature value in the matrix between 0 and 1 are obtained after sigmoid activation function processing, and the average value of the probability values is the identification result output by the identification network; the authentication network includes: convolutional layers, 5-10 strided convolutional layers, and final convolutional layers.

The system of the invention has the following working procedures:

firstly, preprocessing a source image and a target image by an image preprocessing module, and performing pre-registration by using a traditional method, namely sequentially performing rigid registration, affine registration and nonlinear registration;

secondly, a training sample screening module screens the sample pre-registered by the traditional method, and screens out a sample with good hierarchical correspondence and clear organ tissues as a training sample of the GAN network; the screening can adopt manual screening;

thirdly, training the GAN network by using the screened training samples;

in the training process, the training results are compared, and the image preprocessing method, the network training method and the data enhancement method are adjusted to achieve the best training effect.

The image preprocessing adjusting method comprises three modes:

(2) resampling all 2D images fed into the network to a resolution of 128 x 128 to 1024 x 1024;

The method for enhancing the adjustment data specifically uses, but is not limited to, methods of turning over, rotating, affine transformation, noise increasing, cutting, color disturbance, scaling and the like, and specifically enhances the following modes:

turning over: the folding mode comprises three modes of horizontal folding, vertical folding and diagonal folding, and one of the three folding modes is randomly selected for data enhancement;

rotating: selecting an angle threshold value theta (the range of theta is 120 degrees to 180 degrees), and randomly rotating at an angle from-theta degrees to theta degrees;

affine transformation: respectively randomly taking four points in a circular area with four end points of the image as the circle center and n pixels in diameter as transformation vertexes of affine transformation to carry out affine transformation;

scaling transformation: the image scaling is performed by taking values randomly within a certain percentage range (typically 10% -15%) of the image size.

The network training method is adjusted by using, but not limited to, a conjugate gradient method, an alternating direction multiplier method, a random gradient method and other training methods, setting a proper batch size and a proper learning rate, and performing network convergence after training for a certain number of rounds.

(IV) performing auxiliary registration of the GAN network, wherein the specific process is as follows:

(1) inputting a three-dimensional source modal image slice to be registered into a trained GAN network in the form of a two-dimensional image, outputting a corresponding two-dimensional target modal simulation image, and integrating the two-dimensional simulation image into a three-dimensional simulation image;

(2) performing traditional registration on the target modal simulation image obtained in the last step and a target modal image to be registered, namely sequentially performing rigid registration, affine registration and nonlinear registration to obtain a corresponding pixel mapping relation, namely a deformation field;

(3) and fusing the deformation field obtained in the last step with the source image to be registered to complete the GAN network auxiliary registration.

Compared with the prior art, the system has the following advantages:

(1) a novel auxiliary registration system based on deep learning is provided. The traditional multi-modal registration method uses mutual information as a loss function, and whether the mutual information can realize accurate registration depends on whether a good initialization process exists or not, namely whether a sufficient number of correct pairing relations can be known at the beginning of optimization. According to the invention, the target modal image is generated through the GAN network, and a good initialization process is provided for the traditional registration method through modal conversion, so that the registration precision is improved;

(2) the registration time of the conventional registration method is shortened. Because a GAN network is used for converting a source mode into a target mode and inputting the target mode into the traditional registration method, image features are easier to extract, so that the registration time is shortened, and the average registration time is shortened by 14 seconds through statistics of the registration time of a number of samples;

(3) the GAN network has excellent learning effect on images of various modes, is not limited to DWI, DCE, fMRI, even CT and other images, so that the GAN network with different functions can be obtained by only feeding the images of the corresponding modes for training so as to assist in the registration among different modes;

(4) network training data is readily available. The GAN network has low requirements on training data, and only needs that the input source image has good level correspondence with the target image and clear organization outline. By changing the structure of the GAN network (e.g., Cycle-GAN), training can be accomplished even with unpaired datasets.

Drawings

Fig. 1 is a diagram of a GAN training process.

Fig. 2 is a GAN assisted registration flow diagram.

Fig. 3 shows the result of the conventional pre-registration.

Fig. 4 is a diagram of a generator network structure in the present invention.

Fig. 5 is a diagram showing the structure of a discriminator network in the present invention.

Fig. 6 is a diagram illustrating the effect of generating a simulated DCE for a GAN network.

Fig. 7 shows the generation of the deformation field after ANTs registration.

Fig. 8 is a comparison of GAN assisted registration and ANTs registration results. Wherein (a), (b) and (c) are three cases.

Detailed description of the invention

In the following, a batch of samples of 100 patients with lesion labeling is taken as an example for analysis, wherein the samples comprise DCE-MRI images and DWI images, and the invention is further explained by combining the drawings. Several variations and modifications of the following process are within the scope of the invention without departing from the spirit of the invention. The specific working flow of the system of the invention is as follows.

Firstly, preprocessing raw DWI and DCE data of a training sample, and performing pre-registration by using a traditional method, namely performing rigid registration, affine registration and nonlinear registration in sequence.

And (II) screening the samples subjected to the traditional pre-registration (manual screening), wherein the screened samples have good hierarchical correspondence, for example, as shown in fig. 3, the samples with clear organ tissues are used as training samples of the GAN network. The samples were finally divided into two groups, 92% of the samples for network training and 8% for final outcome validation.

Thirdly, building a GAN network based on a pix2pix architecture, wherein a generator network architecture of the GAN network is shown in fig. 4, and a structure similar to U-Net is adopted, and the structure comprises two stages of down sampling and up sampling; in a down-sampling stage, image feature extraction is carried out through the convolution layer; in the up-sampling stage, the network uses 'jump connection', namely, the dropout output of the previous layer is connected with the output of the symmetrical down-sampling convolution layer of the previous layer to be used as the input of the anti-convolution layer, so that the information extracted by down-sampling can be continuously memorized again when up-sampling is carried out, and the generated image can keep some information of the original image as much as possible. The network down-sampling stage comprises 3 step convolutional layers with the step length of 2 and convolutional kernels of 64, 128 and 256 respectively, the features of the input image with the step length of 256,256 and 1 are extracted to be (32, 32 and 256), and the features are continuously extracted to be (1, 1 and 512) through 5 convolutional layers with the step length of 2 and the convolutional kernels of 512; in the up-sampling stage, the feature matrix is up-sampled to (16, 256) by 4 deconvolution layers with the step size of 2 and the convolution kernel of 512, and then restored to (256, 1) by 4 step deconvolution layers with the step size of 2 and the convolution kernel of 256,128,64,1, that is, an image with the same size as the original image is output. Except the last convolution layer, each convolution layer contains real rule regularization and ReLU activation, and the final activation is carried out at the tail end of the generator by adopting a tanh activation function. In the upsampling phase, the dropout ratio for each deconvolution layer is set to 0.5.

The authentication network structure is shown in fig. 5. In order to make more accurate judgment on the local part of an image, a patch-GAN structure is adopted in the pix2pix identification network, namely, a feature matrix with smaller size is obtained after multi-layer convolution of one image, probability value between 0 and 1 corresponding to each feature value in the matrix is obtained after sigmoid activation function processing, and the average value of the probability value is the identification result output by the identification network. The specific implementation process is as follows: the original DWI image and the original DCE or the simulated DCE image are input into a discriminator network in pairs, firstly, 4 layers of step-by-step convolutional layers with the step length of 2 and convolutional kernels of 64, 128, 256 and 512 are processed to obtain a characteristic diagram of (16, 16, 512), and then, the convolutional layers with the step length of 1 and the convolutional kernels of 1 are processed to obtain a characteristic matrix of (16, 16, 1). And then, carrying out sigmoid activation function processing on each characteristic value in the characteristic matrix to obtain a corresponding probability value, and averaging the probability values to obtain the final output of the identification network.

Through multiple adjustments, the final pretreatment modes adopted by the invention include the following three types:

(1) because some images have the problem of incomplete images of the head layer and the tail layer in the acquisition process, the initial 3 layers and the tail 3 layers of each 3D image sample are removed;

(2) resampling all 2D images fed into the network to 256 x 256;

(3) the pixel point range of each picture is [0,255], and because the generator finally adopts the tanh activation function with the output of (-1, 1), each 2D image is subjected to normalization processing, and the pixel value is normalized to be in the range of [ -1,1 ].

Adopting on-line data enhancement, respectively adopting four methods of turning, rotating, affine transformation and zooming, and the specific parameters are as follows:

turning over: the turning modes comprise three types, namely horizontal turning, vertical turning and diagonal turning, and one of the three turning modes is randomly selected for data enhancement;

rotating: -180 to 180 degrees random angular rotation;

perspective transformation: taking four end points of the image as circle centers respectively, randomly taking four points in a circular area with the diameter of 22 pixels, namely within 5 percent of the total pixels, and taking the four points as transformation vertexes of affine transformation to perform affine transformation;

scaling transformation: randomly taking values in the range of 10% -15% of the image size, and zooming the image.

The network was trained in a Tesla V100 GPU with a batch (batch size) set to 80, a learning rate of 2e-4, and a training round number of 20000, with the best results being obtained, as shown in fig. 6.

And (V) cutting 7 three-dimensional DWI images to be registered into slices, inputting the slices into a GAN network in the form of two-dimensional images, outputting corresponding two-dimensional simulation DCE mode images, integrating the two-dimensional images into a three-dimensional simulation DCE image, performing three-dimensional pixel normalization, and eliminating pixel jump among levels.

And (VI) registering the simulated DCE image and the target DCE image by using a traditional method, namely sequentially carrying out rigid registration, affine registration and nonlinear registration to obtain a corresponding pixel mapping relation, namely a deformation field, as shown in FIG. 7.

And (seventhly), fusing the deformation field obtained in the last step with the DWI image to be registered, and finishing the GAN network auxiliary registration process.

(eighth), only 7 samples are registered by using the traditional method, and are visually compared with the registration result assisted by using the GAN network, and 3 samples with obvious effect in 7 cases are taken for display, as shown in fig. 8. In the result graph, a Mask layer is a lesion position marked on DCE influence by a doctor, GAN + ANTs and ANTs respectively represent the result of using GAN auxiliary registration and using only a traditional method for registration, and a yellow frame region is a region where a lesion is located. Observing the registration results of the three samples, compared with the registration method only using ANTs, the registration method of GAN + ANTs can register the focus on the DWI into the same layer, and the focus position marked on the DCE image by a doctor are overlapped by more than 70%; however, no lesion can be observed in the result of ANTs registration, and the levels cannot be accurately corresponded.

(nine) analyzing results: tumor positions of DWI images registered by the two methods are respectively marked, and compared with the tumor positions marked by doctors, similarity measures are respectively calculated, and the Dice index is adopted for similarity analysis in the embodiment. And calculating the time consumed by the two registration methods, and evaluating from the speed perspective. The results of the specific analyses of 7 samples are shown in Table 1.

As can be seen from table 1, in the 5 samples (6 lesions) with non-zero Dice values, the GAN network-based auxiliary registration method all optimizes the registration result to a certain extent, wherein the Dice value increase range is between 0.04 and 0.45; and for three samples (with the Dice value of 0) which cannot be registered with the focuses only by using ANTs, the registered focus area can be coincided with a small part of the gold standard. In the registration time, 3 samples in 5 samples are improved in registration time, the mean registration time based on the GAN network from the overall analysis is 742.88 seconds, and the mean registration time of the traditional registration method is 756.16 seconds, which is slightly improved.

Therefore, from the visual image and Dice index analysis, the images generated by the GAN network and the DCE are jointly used as ANTs registration input, the registration effect can be achieved, the accuracy of most cases can be improved, and the registration time is obviously shortened.

TABLE 1 comparison of two method dice indexes with registration time in six samples

。

Claims

1. A multi-modal liver magnetic resonance image registration system based on deep learning is characterized by specifically comprising the following modules:

the training sample screening module is used for screening the sample pre-registered by the traditional method, screening out a sample with good hierarchical correspondence and clear organ tissues as a training sample of the GAN network;

the GAN network module is trained and used for registering the multi-mode liver magnetic resonance image; the GAN network is built by adopting a pix2pix architecture; the constructed GAN network comprises a discriminator network and a generator network; wherein:

the generator network adopts a structure similar to U-Net, and comprises: 5-12 convolutional layers, 5-15 deconvolution layers; selecting a proper activation function for each convolution layer;

the identification network adopts a patch-GAN structure, namely, an image is convoluted by multiple layers to obtain a feature matrix with smaller size, probability values of each feature value in the matrix between 0 and 1 are obtained after sigmoid activation function processing, and the average value of the probability values is the judgment result output by the identification network; the authentication network includes: convolutional layers, 5-10 strided convolutional layers, and final convolutional layers;

the system has the following working flows:

secondly, a training sample screening module screens the sample pre-registered by the traditional method, and screens out a sample with good hierarchical correspondence and clear organ tissues as a training sample of the GAN network;

thirdly, training the GAN network by using the screened training samples;

in the training process, the training results are compared, and an image preprocessing method, a network training method and a data enhancement method are adjusted to achieve the best training effect;

2. The deep learning-based multi-modal liver magnetic resonance image registration system according to claim 1, wherein the image preprocessing method is adjusted in three ways:

(1) for the condition that the first and last layers of images are incomplete in the acquisition process of some images, removing 3-10 layers of the beginning and the end of each 3D image sample;

(2) resampling to a resolution of 128 x 128 to 1024 x 1024 for all 2D images fed into the network;

(3) and normalizing the 2D image according to the value range of the pixel points of the image pair and the output range of the activation function of the network result, and normalizing the pixel value to be in the range of [0,1] or [ -1,0 ].

3. The deep learning based multi-modal liver magnetic resonance image registration system according to claim 1, wherein the adjustment data enhancement method specifically comprises: turning over, rotating, affine transformation, noise increasing, cutting, color disturbance and scaling;

turning over: the method comprises three modes of horizontal turning, vertical turning and diagonal turning, and one of the three turning modes is randomly selected for data enhancement;

rotating: selecting an angle threshold value theta, and rotating at a random angle from-theta degrees to theta degrees; θ ranges between 120 ° and 180 °;

scaling transformation: randomly taking a value within a certain percentage range of the image size, and zooming the image; the percentage ranges from 10% to 15%.

4. The deep learning-based multi-modal liver magnetic resonance image registration system according to claim 1, wherein the network training method is adjusted, specifically, a conjugate gradient method, an alternating direction multiplier method, a random gradient method, etc. are selected as the network training methods, an appropriate batch size and learning rate are set, and the network converges after a certain number of training rounds.