CN114022521A

CN114022521A - A registration method and system for non-rigid multimodal medical images

Info

Publication number: CN114022521A
Application number: CN202111192744.4A
Authority: CN
Inventors: 张旭明; 王一博
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-10-13
Filing date: 2021-10-13
Publication date: 2022-02-08
Anticipated expiration: 2041-10-13
Also published as: CN114022521B

Abstract

The invention belongs to the field of image registration in image processing and analysis, and particularly discloses a registration method and a registration system of a non-rigid multimode medical image, wherein the method comprises the following steps: (S1) training a first-level CNN network based on the original medical image and the structural representation diagram thereof, so that the trained first-level CNN network can generate the structural representation diagram based on the input image; (S2) training a second-level CNN network based on the structural representation diagrams of the reference image, the floating image and the label image, so that the trained second-level CNN network can estimate a deformation field; (S3) aiming at the medical image to be registered, obtaining a corresponding structural representation diagram by utilizing the trained first-level CNN network; (S4) obtaining a registration image aiming at the medical image to be registered by using the trained second-level CNN network. The invention is based on unsupervised deep learning, network input does not distinguish the mode of the medical image, and is particularly suitable for the non-rigid registration of the multimode medical image, and the registration precision is high.

Description

Non-rigid multi-mode medical image registration method and system

Technical Field

The invention belongs to the field of image registration in image processing and analysis, and particularly relates to a method and a system for registering a non-rigid multimode medical image, which are used for realizing the registration of the non-rigid multimode medical image based on unsupervised deep learning.

Background

The multi-modal medical image registration is a key technology in medical image analysis, and plays an important role in various clinical applications. However, the gray scale difference of images obtained by different imaging systems is large (for example, the gray scales of gray matter, white matter, cerebrospinal fluid and the like imaged by different weighting modalities in magnetic resonance imaging are different), and meanwhile, complex non-rigid deformation may exist between the images, and these factors make the multi-mode medical image registration become a very challenging problem.

In the conventional multi-mode medical image registration method, a relatively representative method is a feature-based registration method and a gray-scale-based registration method. The feature-based registration method firstly utilizes a designed feature extraction method to obtain image features (such as points, lines, planes and the like in an image), then carries out optimization solution on the feature-based similarity measurement to realize registration, and the registration accuracy of the method is greatly influenced when sufficient corresponding feature information is difficult to find among multimode medical images. The registration method based on the image gray scale information is characterized in that similarity measurement (such as mutual information and regional mutual information) based on gray scale statistical information among images is constructed and optimized, and the method is complex in calculation and difficult to ensure the registration accuracy due to neglect of local structure information of the images. To overcome the shortcomings of the above methods, in recent years, researchers have proposed registration algorithms based on image structure characterization, such as entropy map (ESSD), weber neighborhood descriptor (WLD), Modality Independent Neighborhood Descriptor (MIND), and Zernike Moment Local Descriptor (ZMLD) based methods. The methods convert multi-modal images into single-modal characterization images, then use Sum of Squared Differences (SSD) of the characterization results as similarity measurement, and continuously perform iterative optimization to realize registration.

In view of the above-mentioned shortcomings of the conventional multi-mode image registration method, in recent years, researchers have proposed a deep learning-based registration method, which is mainly classified into two categories. The first type is to extract depth features by using a depth learning model and construct a registration objective function by using the depth features, then optimize the registration objective function by using an iterative strategy, and obtain parameters of a transformation model, thereby obtaining a registration result. Typical representatives of such methods include registration methods based on principal component analysis networks PCANet and Laplacian feature mapping networks LENet, which involve time-consuming iterative optimization processes as conventional registration methods, and efficient image registration is difficult to achieve. The second method is an image registration method based on end-to-end deep learning, which directly estimates the deformation field using a supervised learning strategy or directly estimates the registration result using an unsupervised learning strategy. Balakrishnan et al propose a VoxelMorph (abbreviated as Morph) -based image registration method, which constructs a loss function by using the difference between a registered image and a reference image, thereby directly estimating a deformation field between the reference image and a floating image, and applying the deformation field to the floating image by using a space conversion layer, thereby obtaining the registered image. Hu et al propose a pyramid network structure-based registration method, which extracts features from a registration image and a reference image respectively, generates a deformation field layer by adopting a pyramid structure, and applies the deformation field to a floating image by adopting a lattice sampling method, thereby obtaining a plurality of registration images with different resolutions.

On the whole, the conventional multi-mode medical image registration method usually realizes registration based on iterative optimization similarity measurement, and real-time registration of images is difficult to realize, while the existing registration method based on deep learning optimizes the registration process to a certain extent and reduces the registration time, but the adopted network structure is difficult to effectively extract the medical image characteristics, and the method is only applied to single-mode medical image registration and is difficult to meet the requirement of accurate registration of complex multi-mode images.

Disclosure of Invention

In view of the above defects or improvement requirements of the prior art, an object of the present invention is to provide a method and a system for registering a non-rigid multi-mode medical image, wherein a flow design of the whole registration method, a key two-stage Convolutional Neural Network (CNN) network configuration, and a structural configuration of a corresponding registration system are improved, so as to achieve fast and accurate registration of the medical image through the two-stage CNN network, wherein a first-stage CNN (CNN _ sp) can obtain a structural representation through learning sr (structural replication), and a second-stage CNN (CNN _ da) can further estimate a deformation field according to the structural representation generated at the first-stage CNN, and the two-stage CNN network is based on unsupervised deep learning, and is suitable for non-rigid registration of the multi-mode medical image, and has high registration accuracy.

To achieve the above object, according to one aspect of the present invention, there is provided a non-rigid multi-modality medical image registration method, characterized by comprising the steps of:

(S1) based on N₁Original medical image I_oObtaining a structural representation map SR _ o thereof through a registration algorithm of image structural representation, thereby obtaining N₁For the original medical image I_oTraining a first-stage CNN network by taking the structure representation SR _ o as training set data, so that the trained first-stage CNN network can generate a structure representation based on an input image;

(S2) based on N₂Medical image and reference image I which is known in advance and corresponds to the medical image and the reference image_rFloating image I_fAnd label image I_lFirstly, respectively obtaining reference images I through a registration algorithm of image structure representation_rFloating image I_fAnd label image I_lThe structural representation diagrams SR _ r, SR _ f and SR _ l are used as training set data to train a second-level CNN network, so that the trained second-level CNN network can estimate a deformation field;

(S3) respectively inputting a reference image and a floating image of the medical image to be registered into the trained first-level CNN network to obtain corresponding structure representation graphs SR _ r _ g and SR _ f _ g;

(S4) inputting the structural representation graphs SR _ r _ g and SR _ f _ g of the reference image and the floating image obtained in the step (S3) into the trained second-level CNN network to obtain a deformation field; then, the deformation field acts on the floating image of the medical image to be registered through space conversion, and therefore the registered image aiming at the medical image to be registered can be obtained.

As a further preferred aspect of the present invention, in the step (S1) and the step (S2), the structural representation map is obtained by a registration algorithm of the image structural representation, and the specific calculation formula is:

wherein I denotes the object image for which the registration algorithm for the image structure characterization is aimed at, M_dRepresenting the result of the filtering of the image in a grid region of a given side length r, S_dRepresents the filtering result of the image in the four neighborhoods, which is defined as follows:

S_d＝(sd₁+sd₂)²

wherein, represents a convolution; m_rRepresenting a grid search area with the side length r, wherein the value of r is preset; g (x) represents a gaussian filter kernel with a kernel size of 3 × 3 centered on pixel x; dis is M_rThe sum of squared differences of two image blocks S centered on pixel x and its neighboring pixels x + m; n is the number of neighborhood elements; sd₁，sd₂The definition is as follows:

sd₁＝c₁·sd(x)

wherein c is₁And c₂For taking predetermined constants, M₀Represents the search area of the four neighborhoods and mean (-) represents the mean operation.

As a further preferred aspect of the present invention, in the step (S1), the loss function L used for the training is₁Is defined asThe first-level CNN network is based on an original medical image I in training set data_oThe Mean Square Error (MSE) between the generated structure representation SR _ g and the corresponding structure representation SR _ o in the training set data is:

wherein | · | purple_FRepresenting the Frobenius norm and Q the number of pixels in a two-dimensional image.

As a further preferred aspect of the present invention, in the step (S2), the loss function L used for the training is₂The definition is as follows:

L₂＝L_MSE+λL_Φ

wherein L is_MSEThe Mean Square Error (MSE) between SR _ g and SR _ l formed by the deformation field phi generated by the second-level CNN network acting on SR _ f through spatial transformation is represented; l is_ΦL representing the deformation field phi generated by the second-level CNN network₂The norm of the number of the first-order-of-arrival,

lambda is used to balance L_MSEAnd L_ΦA constant preset for value; and L is_MSEAnd L_ΦAre respectively defined as follows:

As a further preferred aspect of the present invention, the first-stage CNN network is a deep learning network, and includes pyramid convolution (PyConv) and global mean pooling (GAP) that are connected in sequence; wherein the pyramid convolution (PyConv) is used for extracting features of an input image, and the global mean pooling (GAP) is used for receiving the features and forming a characterization result output;

preferably, the pyramid convolution (PyConv) is formed by stacking standard convolution blocks defined as follows in the order of the input filter sequence: the standard convolution block consists of a 2D convolution layer (Conv2D) with step size 1 and kernel size 3 x 3 and a non-linear active layer (ReLU).

As a further preferred aspect of the present invention, the second-level CNN network is a deep learning network, and includes a pyramid code (PyEncode), a feature fusion (Merge), and an attention decoding (AttDecode) that are sequentially connected; the pyramid coding (PyEncode) is used for extracting features of different dimensions of an input image, the feature fusion (Merge) is used for fusing features of corresponding sizes of a structure representation graph of a floating image and a structure representation graph of a reference image, and the attention decoding (AttDecode) is used for finally estimating a deformation field;

preferably, the first and second liquid crystal materials are,

the pyramid code (PyEncode) consists of five convolutional blocks, wherein,

the first volume block consists of a volume layer (Conv) with a kernel size of 3 x 3 and step size 1, and a homogenization layer (BN) and a non-linear activation layer (ReLU);

the four volume blocks, the second to fifth volume blocks, each contain a convolution layer (Conv) with a kernel size of 3 x 3 and a step size of 2, and a homogenization layer (BN) and a non-linear activation layer (ReLU); and, except for the second convolution block, the other three convolution blocks all contain residual convolution (ResConv); the residual convolution (ResConv) comprises three convolution layers (Conv) with the kernel size of 3 multiplied by 3 and the step size of 1, three homogenization layers (BN) and a nonlinear activation layer (ReLU) to form a main path, and then a direct input short circuit (Shortcut) is merged (Concate) to form an output;

and the three volume blocks from the third volume block to the fifth volume block form a final feature map through refining (Refine); refining (Refine) consists of an Upsampling (Upsampling), a convolutional layer (Conv) with a kernel size of 3 × 3 and a step length of 2, and a convolutional layer (Conv) with a kernel size of 3 × 3 and a step length of 1;

the feature fusion (Merge) consists of a convolution layer (Conv) with a kernel size of 3 x 3 and a step size of 1, and a homogenization layer (BN) and a nonlinear activation layer (ReLU);

the attention decode (AttDecode) consists of attention convolution (AttConv), Upsampling (Upsampling), merging (Concate), and convolution (Conv).

In a more preferred aspect of the present invention, in the step (S1), N is₁Is an integer of 1000 or more;

in the step (S2), N₂Is an integer of 2000 or more.

According to another aspect of the present invention, there is provided a system for registration of non-rigid multi-modality medical images, comprising:

first level CNN network: the method is used for obtaining corresponding structural representation graphs SR _ r _ g and SR _ f _ g by aiming at a medical image to be registered and a reference image and a floating image thereof;

the first-stage CNN network is trained, and training set data adopted by the training is specifically as follows: based on N₁Original medical image I_oObtaining a structural representation map SR _ o thereof through a registration algorithm of image structural representation, thereby obtaining N₁For the original medical image I_oAnd the structural representation SR _ o is used as training set data;

second level CNN network: the image processing device is used for obtaining a deformation field according to structural representation graphs SR _ r _ g and SR _ f _ g of a reference image and a floating image obtained by the first-level CNN network;

the second-level CNN network is trained, and the training set data adopted by the training is specifically as follows: based on N₂Medical image and reference image I which is known in advance and corresponds to the medical image and the reference image_rFloating image I_fAnd label image I_lRespectively obtaining reference images I through a registration algorithm of image structure representation_rFloating image I_fAnd label image I_lThe structural representation diagrams SR _ r, SR _ f and SR _ l are used as training set data;

a space conversion module: and the deformation field obtained by the second-stage CNN network acts on a floating image of the medical image to be registered through space conversion, so that a registered image aiming at the medical image to be registered is obtained.

Compared with the prior art, the technical scheme of the invention has the following characteristics: firstly, compared with the traditional registration method, the method and the system realize the non-rigid registration of the end-to-end medical image by utilizing the deep learning, directly estimate the registration image, and generate the registration result by an iterative method instead of the traditional registration method, thereby realizing the fast and efficient medical image registration. Secondly, the method and the system do not need to directly estimate the deformation field, adopt the structure representation network to carry out structure representation on the multimode image, convert the multimode image registration into single-mode image registration, further utilize a double-branch network structure combining coding feature extraction, convolution feature fusion and attention decoding in the second-level CNN network, can effectively extract the features of the structure representation image, organically combine the structure representation network and the double-branch network structure, and provide a solid foundation for realizing the efficient and accurate registration of the multimode image.

The invention discloses a non-rigid multimode medical image registration method based on a convolutional neural network, which is based on an end-to-end deep learning strategy and adopts two stages of CNNs to realize image unsupervised registration, wherein the first stage of CNN is used for acquiring a single-mode representation image of a multimode image, the second stage of CNN is used for acquiring a deformation field from a floating image to a reference image, and then space conversion is used for realizing registration. The method adopts two stages of CNNs to respectively realize structural representation and accurate estimation of the deformation field, wherein the deformation field estimation adopts a pyramid coding and attention decoding structure, and effectively inhibits irrelevant characteristic information, thereby greatly improving the precision and efficiency of multi-modal medical image registration.

In addition, the registration system correspondingly obtained based on the registration method is an integrated registration system, and integrates the calculation of the structural representation diagram, the estimation of the deformation field phi and the final space transformation floating image I_fThe user only needs to select the two-dimensional image to be registered and the registration mode (i.e. floating image mode-reference image mode, for example, the floating image is in T1 mode, the reference image is in T2 mode, and then the two-dimensional image and the reference image are selected by the userThe registration mode is T1-T2), the registration map can be directly generated.

Drawings

Fig. 1 is a schematic structural diagram of a non-rigid multi-modality medical image registration system according to the present invention.

FIG. 2 is a schematic diagram of a CNN _ sp structure according to the present invention.

FIG. 3 is a schematic structural diagram of CNN _ da according to the present invention.

FIG. 4a is a floating image T1 for example 1 of the present invention and comparative examples 1-3.

FIG. 4b is a reference image T2 used in example 1 of the present invention and comparative examples 1-3.

FIG. 4c shows the registration results of T1-T2 images obtained by the method of embodiment 1 of the present invention.

FIG. 4d shows the registration results of T1-T2 images obtained by the method of comparative example 1.

FIG. 4e shows the registration results of T1-T2 images obtained by the method of comparative example 2.

FIG. 4f shows the registration results of T1-T2 images obtained by the method of comparative example 3.

FIG. 5a is a floating image T1 for example 1 of the present invention and comparative examples 1-3.

Fig. 5b is a reference image PD used in example 1 of the present invention and comparative examples 1 to 3.

Fig. 5c shows the registration result of the T1-PD image obtained by the method of embodiment 1 of the present invention.

Fig. 5d shows the registration result of T1-PD image obtained by the method of comparative example 1.

Fig. 5e shows the registration result of T1-PD image obtained by the method of comparative example 2.

Fig. 5f shows the registration result of T1-PD image obtained by the method of comparative example 3 according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention relates to a non-rigid multi-mode medical image registration method based on unsupervised deep learning and a corresponding system thereof, in general, two-stage CNN (Double CNN, abbreviated as DCNN) is trained by a large amount of medical data (such as medical data corresponding to more than or equal to 2000 original medical images, for example, a total floating image-reference image pair is more than or equal to 2000), wherein the first-stage CNN (CNN _ sp) is used for obtaining a characterization image, and the second-stage CNN (CNN _ da) is used for accurately estimating a deformation field. Then, the reference image and the floating image are input into the trained DCNN, and a registration image is obtained.

The method comprises the following steps:

(S4) inputting the structural representation graphs SR _ r _ g and SR _ f _ g of the reference image and the floating image obtained in the step (S3) into the trained second-level CNN network to obtain a deformation field; and then, the deformation field acts on the floating image of the medical image to be registered through space conversion, so that the registered image aiming at the medical image to be registered can be quickly obtained.

For example, it is possible to:

the method comprises the following steps: inputting N reference images I_rN floating images I_fAnd N label images I_lIn the DCNN, training a DCNN network by using the images;

step two: reference image I of original mode_rOf floating images I_fInputting the image into a trained DCNN to obtain the deformation field phi, and applying phi to the floating image I by using space conversion_fObtaining a registered image I_g。

Wherein, the step one may specifically be: calculating a structural representation SR _ r of a reference image, a structural representation SR _ f of a floating image and a structural representation SR _ l of a label image by using a structural representation SR known in the prior art; then, the original image I_oAnd the representation chart SR _ o forms a training pair which is input into CNN _ sp to train the CNN _ sp network; then, SR _ r, SR _ f and SR _ l are input into CNN _ da, and the CNN _ da network is trained.

The second step may specifically be: reference image I of original mode_rOf floating images I_fInputting the data into the trained CNN _ sp to obtain SR _ r and SR _ f; then, inputting results SR _ r and SR _ f of the CNN _ sp into the trained CNN _ da to obtain a deformation field phi; then, the deformation field phi is applied to the floating image I by space conversion_fObtaining a registered image I_g。

In addition, N₁May be an integer of 1000 or more, N₂Can be an integer equal to or greater than 2000, since the first level CNN is used for modality unification and the second level CNN is used for generating a deformation field required for registration; compared with the generation of a deformation field, the training data volume required by the unified mode is smaller. In addition, during the training process, after the training is completed according to a registration mode, the training can be performed according to a plurality of preselected registration modes (such as T1-T2, T1-PD, etc.), and then the training is performed according to a registration mode.

Specifically, according to the present invention, the registration of the non-rigid multi-mode medical image comprises the following steps:

step 1, utilizing SR to calculate and obtain original image I_oStructural representation of (2) SR _ o:

wherein

Representing the result of filtering the image in a grid region of a specified side length r (the size of r can be preset),

represents the filtering result of the image in the four neighborhoods, which is defined as follows:

wherein represents a convolution, M_rRepresenting a grid search area of side length r, G (x) representing a Gaussian filter kernel of kernel size 3 × 3 centered on pixel x, Dis being M_rThe sum of squared differences of two image blocks S centered on a pixel x and its neighboring pixels x + m (where S refers to the image block whose Euclidean distance is calculated and the result after four-neighborhood filtering

The physical meanings are different), and n is the number of the neighborhood elements;

the definition is as follows:

wherein c is₁And c₂The value of the SR is constant (the value is preset and may be a rational number greater than 1 and less than 2), so as to ensure that the SR can provide a clear and complete image representation result. M₀Represents the search area of the four neighborhoods and mean (-) represents the mean operation.

And 2, constructing a CNN _ sp network structure. CNN _ sp consists of Input (Input), pyramid convolution (PyConv) and global mean pooling (GAP).

Input accepts original image I_oAnd directly connected to PyConv.

PyConv is formed by stacking standard convolution blocks defined below in the order of the input filter sequence. The standard convolution block consists of a 2D convolution layer (Conv2D) with step size 1 and kernel size 3 x 3 and a non-linear active layer (ReLU).

GAP accepts high dimensional features from PyConv and outputs a structural characterization result SR _ g.

And 3, constructing a loss function. The loss function L used for network training in the step 2₁Defined as the MSE between the network generated structural representation SR _ g and the representation image SR _ o. I.e. the loss function is defined as follows:

wherein | · | purple_FRepresenting the Frobenius norm, Q representing the image in a two-dimensional imageThe number of elements.

Step 4, the original image I is processed_o(e.g., different modalities T1, T2, PD) are input into CNN _ sp, a structural characterization graph SR _ g is generated, and the network parameters are iteratively optimized to reduce the MSE of the structural characterization graph SR _ g and the characterization graph SR _ o until the loss error is reduced and stabilized within a certain range.

Step 5, obtaining a reference image (I) by utilizing SR calculation_r) Floating image (I)_f) And a label image (I)_l) The structural representation diagrams of SR _ r, SR _ f and SR _ l;

wherein

Representing the result of the filtering of the image in a grid region of a specified side length r,

wherein M is_rRepresenting a grid search area of side length r, G (x) representing a Gaussian filter kernel of kernel size 3 × 3 centered on pixel x, Dis being M_rThe sum of squared differences of two image blocks S centered on a pixel x and its neighboring pixels x + m, n being the number of neighborhood elements,

the definition is as follows:

similarly, wherein c₁And c₂Is constant (value is not changed) for ensuring that SR can provide clear and complete image representation result, M₀Represents the search area of the four neighborhoods and mean (-) represents the sign of the mean operation.

And 6, constructing a CNN _ da network structure. CNN _ da is composed of pyramid coding (PyEncode), feature fusion (Merge), attention decoding (AttDecode).

The pyramid code (PyEncode) consists of five convolutional blocks. The first volume block consists of a volume layer (Conv) with a kernel size of 3 x 3 and step size 1, a homogenization layer (BN) and a non-linear active layer (ReLU)To generate a Feature map (Feature 1) of the same size as the input. The next four volume blocks each contain a volume layer (Conv) with a kernel size of 3 x 3 and step size of 2, as well as a homogenization layer (BN) and a non-linear activation layer (ReLU). In contrast, the second convolution block does not contain a residual convolution (ResConv). These four convolution blocks are used to generate 1/2ⁱ Feature 1/2 (Feature 1/2)ⁱ) (i ═ 1,2,3, 4). The residual convolution (ResConv) is composed of three convolution layers (Conv) with the kernel size of 3 × 3 and the step size of 1, three homogeneous layers (BN) and a nonlinear active layer (ReLU) to form a main path, and then one direct input short circuit (Shortcut) is combined (Concate) to form an output. The last three volume blocks need to be refined (Refine) to form the final Feature map (Feature 1/2)ⁱ) (i ═ 1,2,3, 4). Refinement (Refine) consists of one Upsampling (Upsampling), one convolutional layer (Conv) with kernel size of 3 × 3 and step size of 2, and one convolutional layer (Conv) with kernel size of 3 × 3 and step size of 1, and two Feature maps Feature 1/2 need to be input simultaneouslyⁱAnd Feature 1/2ⁱ⁺¹And outputting a refined Feature map (Feature 1/2)ⁱ⁺¹)。

Feature fusion (Merge) consists of a convolutional layer (Conv) with a kernel size of 3 × 3 and a step size of 1, and one homogeneous layer (BN) and a nonlinear activation layer (ReLU). Receiving a set of equal-sized images from the floating image I_fAnd a reference picture I_rFollowed by a Feature map (Float Feature 1/2) generated by pyramid encoding (PyEncode)ⁱ)(Reference Feature 1/2ⁱ) (i ═ 0,1,2,3,4), a Mixed Feature (Mixed Feature 1/2) is generatedⁱ)(i＝0,1,2,3,4)。

Attention decoding (AttDecode) consists of attention convolution (AttConv), Upsampling (Upsampling), merging (Concate), convolution (Conv). The attention convolution (AttConv) accepts two Feature maps (Feature 1/2)ⁱ)(Feature 1/2ⁱ⁺¹) They are passed through a convolution of kernel size 1 × 1 with step size 1, and then superimposed (Add), followed by non-linear activation (Sigmod), a convolution of kernel size 3 × 3 with step size 1, non-linear activation (ReLU), (Upsampling), and finally Feature maps (Feature 1/2), respectivelyⁱ) Multiplying to obtain a feature map of attention enhancement(Feature 1/2ⁱ). Up-sampling (Upsampling) expands the length and width of an input image by 2 times through a bilinear difference value and outputs the expanded image. Merging (Concate) adds the channel dimensions of the two feature maps input and outputs. The convolution (Conv) consists of a convolution with a kernel size of 3 x 3 with a step size of 1.

Attention decoding (attDecode) accepts Mixed features (Mixed Feature 1/2) from Feature fusion (Merge)ⁱ) (i ═ 0,1,2,3,4), Upsampling (Upsampling) the image layer by layer, attention convolution (AttConv), merging (Concate), convolution (Conv), keeping useful information, eliminating interference information, and finally generating an accurately estimated deformation field phi through convolution (Conv) with a kernel size of 3 × 3 and a step length of 1.

Step 7 constructs a loss function. The definition of the loss function in step 6 is divided into two parts. The first part is the MSE between SR _ g and the label image SR _ l formed by the action of the deformation field Φ on SR _ f by spatial transformation, ensuring that the generated SR _ g is closer in detail to SR _ l. The second part being L of the deformation field phi₂Norm, the constrained deformation field is sufficiently smooth. The loss function is defined as follows:

L₂＝L_MSE+λL_Φ

wherein λ is used for balancing L_MSEAnd L_ΦThe value of the hyper-parameter is preset, and can be 1, for example; l is_MSEAnd L_ΦAre respectively defined as follows:

And 8, inputting SR _ r and SR _ f into CNN _ da to generate a deformation field phi. Then the deformation field phi acts on SR _ f by using spatial transformation to generate a deformation representation SR _ g, and iterative optimization is carried outNetwork parameter reduction loss function L₂Until the loss is reduced and stabilized within a certain range.

Step 9 for reference image I for registration_rAnd a floating image I_fDirectly inputting the strain data into a trained CNN _ sp network to obtain corresponding structural representation graphs SR _ r and SR _ f, then inputting the strain data into a trained CNN _ da network to obtain a deformation field phi for restoring deformation, and then enabling the deformation field phi to act on a floating image I through space conversion_fAnd obtaining a final registration image.

Example 1

Taking a brain MRI medical image as an example, the present embodiment provides a registration method of a non-rigid multi-mode medical image, which includes the following steps, as shown in fig. 1:

wherein

Shows the result of filtering the image in a grid region with a specified side length r (in this embodiment, r takes 1),

wherein represents a convolution, M_rRepresenting a grid search area of side length r, G (x) representing a Gaussian filter kernel of kernel size 3 × 3 centered on pixel x, Dis being M_rThe sum of squared differences of two image blocks S centered on a pixel x and its neighboring pixels x + m, n being the number of neighborhood elements,

the definition is as follows:

wherein c is₁And c₂Is constant and is used to ensure that the SR can provide clear and complete image representation results, and c is satisfied in this embodiment respectively₁＝1.2，c₂＝1.2。M₀Represents the search area of the four neighborhoods and mean (-) represents the mean operation.

And 2, training the CNN _ sp network. The original image I_oAnd inputting the representation image SR _ o into CNN _ sp, and performing iterative optimization on the CNN _ sp network by using an Adam optimizer to minimize a loss function value. In the present embodiment, the number of iterations is set to 300, and the training process is terminated early when the loss function value varies little (e.g., does not vary by more than ± 0.010) in 30 consecutive iterations.

And 3, training the CNN _ da network. And inputting SR _ r, SR _ f and SR _ l into the CNN _ da network, acting the predicted deformation field phi on SR _ f through spatial transformation to generate SR _ g, and reducing the difference value of SR _ l and SR _ r through iterative optimization. In the present embodiment, λ is set to 0.08, the number of iterations is set to 300, and the training process is terminated early when the loss function value varies little (e.g., varies no more than ± 0.010) in 30 consecutive iterations.

Step 4 for reference image I for registration_rAnd a floating image I_fDirectly inputting the strain data into a trained CNN _ sp network to obtain corresponding structural representation graphs SR _ r and SR _ f, then inputting the strain data into a trained CNN _ da network to obtain a deformation field phi for restoring deformation, and then enabling the deformation field phi to act on a floating image I through space conversion_fAnd obtaining a final registration image.

Comparative example 1

Registration is achieved according to the Morph method in the prior art (Balakrishnan G, ZHao A, Sabuncu M R, et al. an Unsupervised Learning Model for Deformable Medical Image Registration [ C ]//2018IEEE/CVF Conference on Computer Vision and Pattern Registration. IEEE, 2018.). The specific parameters are as follows: the batch size during training was 8, the learning rate was 0.001, and the momentum was 0.5.

Comparative example 2

Registration was achieved according to the MIND method of the prior art (Heinrich M P, Jenkinson M, Bhushan M, et al. MIND: Modaly index neighbor near registration descriptor for multi-modal mapping registration [ J ]. Medical Image Analysis,2012,16(7): 1423-) 1435.). The specific parameters are as follows: the image block size is selected to be 3 x 3.

Comparative example 3

Registration is achieved according to the ESSD method in the prior art (Wachinger C, Navab N. Entry and Laplacian images: Structural representations [ J ] Medical Image Analysis,2012,16(1): 1-17.). Wherein, the concrete parameters are as follows: and selecting 9 x 9 image blocks, and calculating the corresponding entropy of the image blocks by using Gaussian weight, a local normalization method and Parzen window estimation, thereby obtaining the ESSD corresponding to the whole image.

Analysis of Experimental results

To further demonstrate the advantages of the present invention, we compared the registration accuracy of example 1 with comparative examples 1-3. The registration accuracy is evaluated using a TRE value, where TRE is defined as:

where u represents the offset of the deformation field phi in the x-direction and v represents the offset of the deformation field phi in the y-direction. (x, x '), (y, y') denote coordinate pairs for manually selecting the target point.

Registration accuracy testing was performed using simulated MR images, which were taken from the BrainWeb database for example 1 and for each pair of comparisons, simulated T1, T2 weighted MR images, where T1 was used as the floating image and T2 was used as the reference image. Table 1 lists the standard deviation and mean of the TRE values obtained for each method. As can be seen from table 1, when the multi-mode MR images are registered, the TRE values provided by example 1 have a lower mean value and a lower standard deviation than those of other methods, which indicates that the method proposed by the present invention has the highest registration accuracy among all the compared methods.

To more intuitively illustrate the superiority of the present invention over the remaining methods, we provide visual effect plots of the corresponding registered images of example 1 and comparative examples 1-3, as shown in fig. 4 a-4 f. Fig. 4a is a floating image T1 used in example 1 of the present invention and comparative examples 1 to 3, fig. 4b is a reference image T2 used in example 1 of the present invention and comparative examples 1 to 3, fig. 4c is a registration result of T1 to T2 images obtained by the method of example 1 of the present invention, fig. 4d is a registration result of T1 to T2 images obtained by the method of comparative example 1 of the present invention, fig. 4e is a registration result of T1 to T2 images obtained by the method of comparative example 2 of the present invention, and fig. 4f is a registration result of T1 to T2 images obtained by the method of comparative example 3 of the present invention. As can be seen from the registration result graph, the registration result of the method of the invention is better than that of other methods.

TABLE 1 TRE-value comparison of the methods at the time of registration of the T1-T2 images

Registration accuracy testing was performed using simulated MR images, and the simulated T1 and PD-weighted MR images used in example 1 and each pair of comparisons were obtained from the BrainWeb database, where T1 was used as a reference image and PD was used as a floating image. Table 2 lists the standard deviation and mean of the TRE values obtained for each method. As can be seen from table 2, when the multi-mode MR images are registered, the TRE values provided by example 1 have a lower mean value and a lower standard deviation than the other methods, which indicates that the method proposed by the present invention has the highest registration accuracy among all the compared methods.

To more intuitively illustrate the superiority of the present invention over the remaining methods, we provide visual effect plots of the corresponding registered images of example 1 and comparative examples 1-3, as shown in fig. 5 a-5 f. Fig. 5a is a floating image T1 used in example 1 and comparative examples 1-3 of the present invention, fig. 5b is a reference image PD used in example 1 and comparative examples 1-3 of the present invention, fig. 5c is a registration result of a T1-PD image obtained by the method of example 1 of the present invention, fig. 5d is a registration result of a T1-PD image obtained by the method of comparative example 1 of the present invention, fig. 5e is a registration result of a T1-PD image obtained by the method of comparative example 2 of the present invention, and fig. 5f is a registration result of a T1-PD image obtained by the method of comparative example 3 of the present invention. As can be seen from the registration result graph, the registration result of the method of the invention is better than that of other methods.

TABLE 2 TRE-value comparison of the methods at T1-PD image registration

The SR (structural characterization) used in the present invention is a prior art used directly, and can be constructed, for example, by referring to an Image preprocessing process in a prior art document (Heinrich M P, Jenkinson M, Bhushan M, et al. MIND: a modular independent neighboring detailed descriptor for multi-modular reconstruction [ J ]. Medical Image Analysis,2012,16(7): 1423-. The above embodiments are only examples, and the inventive registration method and corresponding system are also applicable to other MRI medical images in addition to brain MRI medical images.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method of registration of non-rigid multi-modality medical images, comprising the steps of:

2. The method for registration of non-rigid multi-modality medical images as claimed in claim 1, wherein in the steps (S1) and (S2), the structural characterization map is obtained by a registration algorithm of image structural characterization, and the specific calculation formula is:

sd₁＝c₁·sd(x)

3. The method for registration of non-rigid multi-modality medical images of claim 1,

in the step (S1)Loss function L for said training₁Defined as the first-level CNN network based on the original medical image I in the training set data_oThe Mean Square Error (MSE) between the generated structure representation SR _ g and the corresponding structure representation SR _ o in the training set data is:

wherein | |. calo | |)_FRepresenting the Frobenius norm and Q the number of pixels in a two-dimensional image.

4. The method for registration of non-rigid multi-modality medical images of claim 1,

in the step (S2), the loss function L used in the training₂The definition is as follows:

L₂＝L_MSE+λL_Φ

5. The method for registration of non-rigid multi-modality medical images of claim 1,

the first-stage CNN network is a deep learning network and comprises a pyramid convolution (PyConv) and a global mean pooling (GAP) which are sequentially connected; wherein the pyramid convolution (PyConv) is used for extracting features of an input image, and the global mean pooling (GAP) is used for receiving the features and forming a characterization result output;

6. The method for registration of non-rigid multi-modality medical images of claim 1,

the second-level CNN network is a deep learning network and comprises pyramid coding (PyEncode), feature fusion (Merge) and attention decoding (AttDecode) which are sequentially connected; the pyramid coding (PyEncode) is used for extracting features of different dimensions of an input image, the feature fusion (Merge) is used for fusing features of corresponding sizes of a structure representation graph of a floating image and a structure representation graph of a reference image, and the attention decoding (AttDecode) is used for finally estimating a deformation field;

preferably, the first and second liquid crystal materials are,

the pyramid code (PyEncode) consists of five convolutional blocks, wherein,

7. The method for registration of non-rigid multi-modality medical images of claim 1,

in the step (S1), N₁Is an integer of 1000 or more;

in the step (S2), N₂Is an integer of 2000 or more.

8. A registration system for non-rigid multi-modality medical images, comprising:

the second-level CNN networkAfter training, the training set data adopted by the training specifically is as follows: based on N₂Medical image and reference image I which is known in advance and corresponds to the medical image and the reference image_rFloating image I_fAnd label image I_lRespectively obtaining reference images I through a registration algorithm of image structure representation_rFloating image I_fAnd label image I_lThe structural representation diagrams SR _ r, SR _ f and SR _ l are used as training set data;