CN111260705B

CN111260705B - Prostate MR image multi-task registration method based on deep convolutional neural network

Info

Publication number: CN111260705B
Application number: CN202010030035.5A
Authority: CN
Inventors: 杜博; 廖健东
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-01-13
Filing date: 2020-01-13
Publication date: 2022-03-15
Anticipated expiration: 2040-01-13
Also published as: CN111260705A

Abstract

The invention discloses a prostate MR image multi-task registration method based on a deep convolutional neural network, which comprises a training stage and an inference stage. The training phase comprises preprocessing of the prostate MR image, building of a neural network model and training of neural network parameters. By expanding the one-way registration task, a multi-task combined training model is realized, the label information of the prostate is used as weak supervision information to guide the training of the network, the cyclic consistency and the inverse consistency are used for restraining the network training, and meanwhile, a dual-path deep convolutional neural network is constructed to realize the sharing of the network weight. And meanwhile, the displacement vector field predicted by the network is regularized, so that the displacement vector field is smoother. In the inference stage, the preprocessed moving image and reference image data are used as the input of a network, a predicted displacement vector field can be obtained through the trained network, and the displacement vector field acts on the moving image, so that the result of the registration of the MR image of the prostate is obtained.

Description

Prostate MR image multi-task registration method based on deep convolutional neural network

Technical Field

The invention belongs to the technical field of medical image processing, and particularly relates to a prostate MR image multi-task registration method based on a deep convolutional neural network.

Background

With the development of computer technology and medical image engineering, more and more scientific technologies are applied to modern medicine, particularly medical imaging technology, which permeates clinical application of medicine, and various medical imaging devices reflect the physical condition of human body from different sides and provide direct and objective information for medical diagnosis and treatment. Nuclear Magnetic Resonance Imaging (MRI), computed tomography (X-ray), ultrasound imaging, Computed Tomography (CT), positron emission computed tomography (PET), etc., common to modern medical imaging techniques, are mainly divided into two main categories: the anatomical image and the functional image have high resolution, can accurately acquire the structural information of the visceral organs, but do not have functional function. The functional image has low resolution, and cannot clearly display the contour information of the organs, but can display the metabolism condition of the human body. Although both imaging technologies are continuously advanced and the imaging result is more and more accurate, in practical clinical application, due to the difference of the imaging principles, a certain imaging technology can only reflect certain specific information of a human body, and a doctor is often required to combine a plurality of imaging technologies to diagnose and treat a patient. However, the information of a plurality of medical images often needs years of experience, is not only limited to a plurality of subjective factors, but also increases the workload of doctors.

The best method for solving the problem is a medical image registration technology, and medical image registration is an important research branch of medical image analysis, is a core technology of medical image fusion and reconstruction, and has important significance in clinical application. Medical image registration techniques refer to geometrically aligning two or more images such that the same pixel represents the same anatomical location. Multiple imaging technologies can be organically combined through a medical image registration technology, multiple medical information is integrated on one image, and a doctor is more intuitively and accurately assisted in diagnosis and treatment.

Image registration techniques can be divided into rigid and non-rigid registration. In the last 80 th century, the registration of medical images mainly takes rigid body registration as a main part, and rigid transformation coefficients between images are estimated by learning medical information such as image gray difference and the like. At present, the algorithm based on rigid body registration is mature and widely applied clinically. But the degree of freedom of rigid body registration is small, and only a small part of the registration problem is solved. For practical medical application, rigid body transformation is far from meeting practical requirements, and more degrees of freedom are needed for the problem of medical image registration. Since the 21 st century, the non-rigid registration technology has been the focus of research and has been rapidly developed, and many scholars have developed many non-linear transformation modes, such as free transformation models based on B-splines, elastic transformation models, physical deformation models based on optical flow diffusion, and the like. The classical non-rigid image registration algorithm usually selects a transformation model at first, then defines a similarity index, and finally optimizes transformation parameters in an iterative manner. Although the classical algorithm achieves good performance, due to the nature of iterative optimization, the speed is often slow, cannot meet the clinical real-time requirements, and may fall into local optima. In addition, different similarity measures have different properties, and different similarity measures need to be defined for different organs or images.

In recent years, deep learning, particularly deep convolutional neural networks, have achieved breakthrough research results in terms of computer vision, and have also rapidly developed in terms of medical image analysis, and deep convolutional neural networks have also achieved certain results in terms of medical image registration, such as DIRNet, voxelmorphh, and the like. Although the training time of the model is long, once the training is finished, the model can rapidly register images to better meet the clinical requirement. Although the current deep learning-based algorithm achieves satisfactory results, the existing method has more defects, and the registration result is often closely related to the size of the training data set, and the physical properties of the displacement vector field are ignored.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a high-precision and high-efficiency multitask registration method for a prostate MR image based on a deep convolutional neural network.

In order to solve the technical problems, the invention adopts the following technical scheme: a prostate MR image multi-task registration method based on a deep convolutional neural network comprises the following steps:

step 1, carrying out uniform format processing and screening on prostate MR images with different formats;

step 2, cutting and normalizing the image in a pixel range, counting the voxel space distribution of the image, and resampling the image to unify the voxel space;

step 3, designing a double-path deep convolution neural network model to realize the registration of a plurality of images, wherein the specific processing process of the network model is as follows,

respectively extracting characteristic information of different scales of a moving image and a reference image through convolution layers and down-sampling layers of two different paths in an encoding stage, namely dividing the moving image and the reference image into an upper branch and a lower branch in the encoding stage, wherein the upper branch is input into the moving image, the lower branch is input into the reference image, the upper branch and the lower branch in the encoding stage have the same structural parameters and operation and comprise 10 convolution layers and 5 down-sampling layers, and the down-sampling layers are processed after each two convolution layers; in the decoding stage, the features learned in the encoding stage are decoded, firstly, feature maps of an upper branch and a lower branch are superposed, then, the features of the upper direct current and the lower direct current are fused through 10 convolutional layers and 5 upper sampling layers, the size of a displacement vector field output after decoding is the same as that of an original image, and an upper sampling layer is passed after every two convolutional layers; simultaneously, the feature map obtained after each up-sampling is overlapped with the feature maps of the upper branch and the lower branch in the same scale in the coding stage through jump connection; the last layer uses 1 × 1 convolution to reduce the dimension of the data and output the final displacement vector field;

step 4, guiding the training of the dual-path deep convolutional neural network model by using the label information of the prostate as weak supervision information, and regularizing the displacement vector field;

and 5, inputting the moving image and the reference image into the trained network model to obtain a predicted displacement vector field, and resampling the moving image by using the displacement vector field to obtain a corresponding registration result.

Further, in step 3, in the encoding stage, the sizes of convolution kernels are all 3 × 3, wherein the convolution step size of a convolution layer is 1, the convolution step size of a down-sampling layer is 2, and the number of characteristic channels of the convolution layer is doubled after each down-sampling layer is performed, and the number of characteristic channels is respectively 32, 64, 128, 256 and 512; in the decoding stage, the convolution kernels are all 3 × 3 in size, the convolution step size of the convolutional layer is 1, the convolution step size of the upsampling layer is 2, and the number of characteristic channels of the convolutional layer after each 1-time upsampling layer is reduced by half, wherein the number of the characteristic channels is 512, 256, 128, 64 and 32.

Furthermore, in step 4, the multi-task interconnection is realized by using the constraint of cycle consistency and inverse consistency in the training process, which is specifically realized as follows,

multitasking refers to A → B → A'; b → A → B'; a' → A → A "; b ' → B ″, where a → B → a ' is taken as an example, a denotes a moving image, B denotes a reference image, and a ' denotes a result image of registration; the mathematical expressions of the cyclic consistency constraint and the inverse consistency constraint are as follows:

cycle consistency:

|A-A″|²+|B-B″|²

inverse consistency:

|Φ_AB-Φ_B′B|²+|Φ_BA-Φ_A′A|²

wherein phi_ABRepresenting the displacement vector field, phi, predicted from the image A registration to the image B network model_B′BRepresenting the displacement vector field, phi, predicted from the registration of image B' to the network model of image B_BARepresenting the displacement vector field, phi, predicted from the image B registration to the image A network model_A′ARepresenting the displacement vector field predicted from the registration of image a' to the image a network model.

Further, the regularization of the displacement vector field in step 4 includes range-constrained regularization, L₂Regularization and reverse folding constraints, wherein the mathematical expression for range constraint regularization is as follows:

wherein p represents a pixel point, Ω represents all pixels in space, Φ () represents a generated displacement vector field, h () represents a deformed pixel grid, obtained by adding the displacement vector field to the original coordinates,

in order to displace the gradient of the vector field,

is L₂Regularization, f_σIs a piecewise function;

where s represents the size of the pixel grid, i.e. the size of the image;

the mathematical expression for the unfolding constraint is as follows:

wherein Relu is used to penalize the position where the fold is created,

when it indicates that there is a fold, the other values indicate no fold, i.e. no penalty.

The invention has the beneficial effects that:

(1) the invention provides a multi-task registration network, which expands a single task into four tasks, increases the data volume to a certain extent, relieves the problem of small available data volume of medical images, and further improves the registration accuracy.

(2) The invention provides a self-adaptive registration algorithm based on anatomical constraint, wherein an anatomical label of an image represents high-level information of the image, the corresponding relation of the image is learned from the high-level information, and the model is prevented from being trained by utilizing the similarity of the image. The Gaussian matrix is used for carrying out spatial smooth probability mapping on the labels, different label information corresponds to different mapping results, the mapping results are used as weak supervision information, overfitting of the network is reduced, training of the registration network is guided, and the problems of large deformation and difficult registration between image pairs are solved.

(3) The invention provides a double-path convolutional neural network model, which is characterized in that the characteristic information of a moving image and a reference image is respectively learned through two paths in an encoding stage, the characteristic information of the images can be better utilized through jump connection and multi-scale fusion in a decoding stage, and meanwhile, weight sharing can be realized through a double-path network, so that the consumption of resources is reduced.

(4) The invention provides cycle consistency constraint and inverse consistency constraint in the multitask registration of the prostate MR image of the deep convolutional neural network, guides the cooperative training among the four tasks, and further improves the registration precision by carrying out range constraint on a predicted displacement vector field.

Drawings

FIG. 1 is a diagram of the multitask registration of the present invention;

FIG. 2 is a diagram of a dual-path convolutional neural network structure according to the present invention.

Detailed Description

For the convenience of those skilled in the art to understand and implement the technical solution of the present invention, the following detailed description of the present invention is provided in conjunction with the accompanying drawings and examples, it is to be understood that the embodiments described herein are only for illustrating and explaining the present invention and are not to be construed as limiting the present invention.

The invention discloses a prostate MR image multi-task registration method based on a deep convolutional neural network, which comprises a training stage and an inference stage. The training phase comprises preprocessing of the prostate MR image, building of a neural network model and training of neural network parameters. The data preprocessing stage comprises data screening, data uniform storage format, pixel normalization and clipping and image resampling. By expanding the one-way registration task, a multi-task combined training model is realized, the label information of the prostate is used as weak supervision information to guide the training of the network, the cyclic consistency and the inverse consistency are used for restraining the network training, and meanwhile, a dual-path deep convolutional neural network is constructed to realize the sharing of the network weight. And meanwhile, the displacement vector field predicted by the network is regularized, so that the displacement vector field is smoother. In the inference stage, the preprocessed moving image and reference image data are used as the input of a network, a predicted displacement vector field can be obtained through the trained network, and the displacement vector field acts on the moving image, so that the result of the registration of the MR image of the prostate is obtained.

The implementation is realized by adopting a Python platform based on a Tensorflow library, and a Python medical image read-write function is taken as an implementation basis. Calling a medical image reading function, inputting a file name of the medical image, reading the medical image into a matrix with the size of X multiplied by Y multiplied by Z, wherein each element in the matrix is a corresponding pixel value of each dimension, Z is the number of slices of the medical image, and X and Y are the length and the width of the medical image respectively. The Python medical image read/write function is well known in the art and will not be described herein.

In an embodiment, the following operations are performed on the medical image based on the matrix X × Y × Z:

(1) and (4) uniformly formatting and screening the prostate MR images with different formats.

In the embodiment, the specific operation of step (1) is as follows: because the formats of the prostate MR image files acquired by different devices are not consistent, the formats of the prostate MR image files are firstly unified into mhd file format. Meanwhile, images without prostate may exist in the acquired images, so that the images are screened by the labels of the anatomy of the prostate, and only the images with the prostate are selected, thereby facilitating later training.

(2) And (4) cutting and normalizing the image in a pixel range, counting the voxel space distribution of the image, and resampling the image to unify the voxel space. .

In the embodiment, the specific operation of step (2) is as follows: the imaging principle of the medical image is different from that of a natural image, the range of the gray value of the acquired data set is large, the gray value is cut, only the pixel value between 0.5% and 99.5% is reserved, the appearance of abnormal values is reduced, and the contrast of the image is increased. Meanwhile, normalization is carried out, and training of the network model is facilitated. Besides, different imaging modes often cause inconsistency of voxels between images, that is, physical distances represented by two pixel points are different, so that the voxel intervals of statistical data are needed, and the voxel intervals are unified by adopting a mode of resampling the images.

(3) And designing a double-path deep convolution neural network model to realize multi-task registration.

In the embodiment, the specific operation of step (3) is as follows: in the field of medical image processing, medical image data often becomes a key for training a network model, in a current model based on a deep convolutional neural network, only a one-way registration (a → B → a ') process from an image moving image (a) to a reference image (B) is considered, a ' is a registration result, and inverse consistency and cyclic consistency of image deformation are ignored, so that the one-way registration is expanded, and as shown in fig. 1, a → B → a ' is respectively realized; b → A → B'; a' → A → A "; b ' → B ″, and taking a → B → a ' as an example, a denotes a moving image, B denotes a reference image, and a ' denotes a result image of registration. In order to realize weight sharing of a task network, a dual-path convolutional neural network is designed, as shown in fig. 2, feature information of different scales of a moving image and a reference image is respectively extracted through convolutional layers and downsampling layers of two different paths at a coding stage, namely, the moving image and the reference image are divided into an upper tributary and a lower tributary at the coding stage, the upper tributary is input into the moving image, the lower tributary is input into the reference image, structural parameters and operations of the upper tributary and the lower tributary at the coding stage are the same, the convolutional layers and the 5 downsampling layers are all included, the size of the convolutional layers is 3 × 3, the convolutional step size of the convolutional layers is 1, and the convolutional step size of the downsampling layers is 2. After every two times of convolution layers, the number of the convolution layer characteristic channels is doubled after every one time of downsampling layers, and the number of the convolution layer characteristic channels is respectively 32, 64, 128, 256 and 512. In the decoding stage, the features learned in the encoding stage are mainly decoded, the feature maps of the upper and lower branches are firstly superposed, then the features of the upper and lower direct currents are fused through 10 convolutional layers and 5 upper mining layers, and the size of a displacement vector field output after decoding is the same as that of an original image. Meanwhile, the feature map obtained after each up-sampling is overlapped with the feature maps of the upper branch and the lower branch in the same scale in the coding stage through jump connection, for example: the feature map scale before the 1 st down-sampling of the upper and lower branches is the same as the feature map scale after 5 up-sampling, so as to improve the shortage of up-sampling information by using the feature map of the bottom layer. Meanwhile, the number of characteristic channels is halved for each convolution layer after 1-time upsampling layer, namely 512, 256, 128, 64 and 32, which correspond to the encoding stage, wherein the convolution kernel size is 3 multiplied by 3, the convolution step size of the convolution layer is 1, and the convolution step size of the upsampling layer is 2. And the final layer uses 1 multiplied by 1 convolution to reduce the dimension of the data and output the final displacement vector field. The displacement vector field is a matrix of two channels of the original image size, which respectively represents the moving distance of the pixel point along the x axis and the moving distance along the y axis,

(4) the label information of the prostate is used as weak supervision information to guide the training of the network, the cycle consistency and inverse consistency constraint are used to realize multi-task interconnection, and the displacement vector field is regularized.

In the embodiment, the specific operation of step (4) is as follows: because the time interval of the collected images is large, the deformation between the images is huge, and the network is difficult to train by directly utilizing the similarity measurement, the anatomical label of the prostate is used as the priori knowledge to guide the training of the network, and the anatomical label is directly used as the weak supervision information, so that the overfitting of the network is caused, and the training of the network is not facilitated. And performing spatial smooth probability mapping on the labels by utilizing a Gaussian matrix, wherein the size of the Gaussian matrix is consistent by the minimum external square of the prostate anatomy, different label information corresponds to different mapping results, and the mapping results are used as weak supervision information to guide the training of the registration network. Meanwhile, potential association among the four tasks is explored, a cycle consistency constraint and an inverse consistency constraint are designed, and the mathematical expression of the constraint is as follows:

cycle consistency:

|A-A″|²+|B-B″|²

inverse consistency:

|Φ_AB-Φ_B′B|²+|Φ_BA-Φ_A′A|²

wherein phi_ABRepresenting the displacement vector field, phi, predicted from the registration of image A to B network_B′B、Φ_BA、Φ_A′AAnd so on.

Meanwhile, range constraint regularization is proposed, and a mathematical expression is as follows:

wherein p represents a pixel point, Ω represents all pixels in the space, Φ () represents a generated displacement vector field, h () represents a deformed pixel grid, obtained by adding the displacement vector field and the original coordinates,

in order to displace the gradient of the vector field,

is L₂Regularization, f_σIs a piecewise function.

Where s represents the size of the pixel grid, i.e. the size of the image.

In addition thereto also use L₂Regularization and reverse folding constraints, the mathematical expression of which is as follows:

wherein Relu is used to penalize the position where the fold is created,

when it indicates that there is a fold, the other values indicate no fold, i.e. no penalty. The details are well known in the art and will not be described herein. And (3) training the network by using the constraint function, inputting A, B and B, A, and then inputting A ', A and B', B, training for 200 rounds, and setting the learning rate to be 10 e-4.

(5) And inputting the moving image and the reference image into a double-path depth convolution neural network model to obtain a predicted displacement vector field, and resampling the moving image by using the displacement vector field to obtain a corresponding registration result.

In the embodiment, the specific operation of step (5) is as follows: the motion image and the reference image are input into a registration network, the registration network can predict a corresponding displacement vector field, the displacement vector field acts on the motion image, the position of each pixel point after displacement can be obtained, and the point after displacement is not on an integer coordinate possibly, so resampling is needed, and a final registration result is obtained.

In specific implementation, the automatic operation of the process can be realized by adopting a software mode. The apparatus for operating the process should also be within the scope of the present invention.

The advantageous effects of the present invention are verified by comparative experiments as follows.

The data used in this experiment were prostate MR images, including 72 patient datasets, acquired twice each, with time intervals up to one year, with rectal coils being used for the first acquisition of the images for better focus observation, and no rectal coils for the second acquisition to increase patient comfort, and for better training of the network, the datasets were augmented with 2D training mode, image size 288 x 288, with voxel spacing of 0.3516 mm. The most advanced classical image normalization method (SyN), the DIRNet based on the convolutional neural network and the registration algorithm (Reg _ Ant) based on the weak supervision are adopted for comparison respectively, and the method is taken as an example of a specific implementation mode.

Medical image registration evaluation index: dice and HD.

In the field of image registration, due to the lack of gold standard, the image registration is generally performed by using segmented evaluation indexes Dice and a hausdorff distance HD, wherein the higher Dice indicates that the regions of interest are overlapped more, i.e. the effect is better. The Hausdorff distance is mainly used for evaluating the edge matching degree of the region of interest, and the lower the HD is, the better the registration effect is.

Wherein, A and B respectively represent the interesting region anatomical labels, | A ≦ B | represents the intersection of A and B.

Where a and B are the points of set a and set B, respectively, and d (a, B) represents the distance between a and B.

TABLE 1 comparative test results

	The method of the invention	SyN	DIRNet	Reg_Anat
					Dice	86.36(±0.0069)	83.19(±0.0083)	80.28(±0.0097)	84.89(±0.0032)
HD	6.3860(±0.6019)	9.0284(±0.5458)	13.3911(±0.3920)	8.6822(±0.3792)

The standard deviation is shown in parentheses, and as can be seen from table 1, the method of the present invention is optimal in both evaluation indexes, which indicates that the method of the present invention has a better image registration effect. Compared with the current optimal classical registration method SyN, the method has a great improvement, and from the aspect of HD, the method has a better image edge processing effect, consumes far less time than the SyN method, and can better meet clinical requirements. Compared with a deep learning-based method, the method has a better effect on restraining the wrong registration, and the generated displacement vector field is more in line with the physical property.

It can be concluded that the method of the present invention has a higher registration accuracy compared to existing medical image registration methods. The method solves the problem that the registration training cannot be carried out by using a deep network due to insufficient training samples, realizes multi-task registration by using a dual-path convolution neural network, can increase a training data set, and further corrects the prediction of a displacement vector field by using the cyclic consistency and the inverse consistency. The invention provides range-constrained regularization, which can effectively inhibit wrong registration, reduce irregular deformation of images and further improve the registration precision.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A prostate MR image multitask registration method based on a deep convolutional neural network is characterized by comprising the following steps:

in the step 4, the multi-task interconnection is realized by using the constraint of cycle consistency and inverse consistency in the training process, which is concretely realized as follows,

cycle consistency:

|A-A″|²+|B-B″|²

inverse consistency:

|Φ_AB-Φ_B′B|²+|Φ_BA-Φ_A′A|²

wherein phi_ABRepresenting the displacement vector field, phi, predicted from the image A registration to the image B network model_B′BRepresenting the displacement vector field, phi, predicted from the registration of image B' to the network model of image B_BARepresenting the displacement vector field, phi, predicted from the image B registration to the image A network model_A′ARepresenting the displacement vector field predicted from the registration of image a' to the image a network model;

2. The method for multitask registration of the MR image of the prostate based on the deep convolutional neural network as claimed in claim 1, wherein: in the step 3, in the encoding stage, the sizes of convolution kernels are all 3 × 3, wherein the convolution step size of a convolution layer is 1, the convolution step size of a down-sampling layer is 2, and the number of characteristic channels of the convolution layer is doubled after each down-sampling layer is carried out, namely 32, 64, 128, 256 and 512; in the decoding stage, the convolution kernels are all 3 × 3 in size, the convolution step size of the convolutional layer is 1, the convolution step size of the upsampling layer is 2, and the number of characteristic channels of the convolutional layer after each 1-time upsampling layer is reduced by half, wherein the number of the characteristic channels is 512, 256, 128, 64 and 32.

3. The method for multitask registration of the MR image of the prostate based on the deep convolutional neural network as claimed in claim 1, wherein: the regularization of the displacement vector field in step 4 includes range-constrained regularization, L₂Regularization and reverse folding constraints, wherein the mathematical expression for range constraint regularization is as follows:

in order to displace the gradient of the vector field,

is L₂Regularization, f_σIs a piecewise function;

where s represents the size of the pixel grid, i.e. the size of the image;

the mathematical expression for the unfolding constraint is as follows:

wherein Relu is used to penalize the position where the fold is created,