CN115375971B

CN115375971B - Multi-mode medical image registration model training method, registration method, system and equipment

Info

Publication number: CN115375971B
Application number: CN202211024408.3A
Authority: CN
Inventors: 王少彬; 张云; 白璐; 陈颀; 陈宇; 丁生苟; 黄玉玲; 袁星星
Original assignee: Beijing Yizhiying Technology Co ltd
Current assignee: Beijing Yizhiying Technology Co ltd
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2023-04-25
Anticipated expiration: 2042-08-24
Also published as: CN115375971A

Abstract

The application relates to a multi-mode medical image registration model training method, a registration method, a system and equipment, which are applied to the technical field of medical image registration, and the method comprises the following steps: preprocessing a medical data image sample to generate a reference image vector sequence, a moving image vector sequence, a 3D reference image vector sequence and a 3D moving image vector sequence; inputting the reference image vector sequence, the moving image vector sequence, the 3D reference image vector sequence and the 3D moving image vector sequence into an image registration network model to generate a deformation field; calculating network loss based on the deformation field and the loss function; adjusting parameters of the image registration network model based on an error back propagation algorithm; and repeatedly executing the processes of inputting the image sample, calculating the network loss and adjusting the model parameters until the loss function is no longer reduced, and completing training to obtain a trained image registration network model. The method has the effect of improving the stability of the registration result.

Description

Multi-mode medical image registration model training method, registration method, system and equipment

Technical Field

The present disclosure relates to the field of medical image registration, and in particular, to a multi-modal medical image registration model training method, registration method, system, and apparatus.

Background

The medical image registration is to convert images acquired by different medical equipment at different times into a uniform space coordinate system, so that the image information at the same space position corresponds to the same anatomical structure, thereby fusing the information acquired by the equipment at different times, completing monitoring focus change, fusing multi-mode information for auxiliary diagnosis and the like, and therefore, the medical image registration method has wide application in medical image processing. Typical registration applications include auxiliary diagnosis, surgical planning, surgical navigation, delineation of a target area for radiotherapy, lesion deformation monitoring, dose mapping and dose accumulation, image-guided radiotherapy, adaptive radiotherapy, and the like. The multi-mode image registration is a technology for registering a plurality of images of the same patient acquired by different types of equipment at different times, is used for integrating the advantages of the images of different types and providing more information for diagnosis and treatment, and therefore has important research significance and application value.

Multimodal image registration presents more technical challenges than single-modality image registration, which processes multiple images acquired by the same device. Due to differences in the different modality imaging modalities and principles, the same anatomy has significant differences in the different modality images. For example, CT images have high resolution and large imaging range, can provide obvious anatomical structure information, but cannot finely distinguish different types of soft tissues; the MR image sequence is sensitive to soft tissue difference, different soft tissue sequences such as white matter and grey matter of brain can be distinguished, the sequence change is more, but the image range acquired each time is relatively smaller; the PET image has low resolution, but can present metabolic function, and is very helpful for tumor diagnosis. In addition to the contrast difference of image pixel values caused by the imaging technology, obvious differences exist in the size and shape of organs in the images due to respiratory motion, body position change, gastrointestinal motility, bladder filling degree and other reasons of a patient when images are acquired at different times. The exploration of similarity indexes suitable for describing the matching degree of the multi-mode images and solving the excessive local deformation condition and deformation field constraint conforming to the actual anatomical change are the urgent problems to be solved in the multi-mode medical image registration technology.

Disclosure of Invention

In order to improve stability of registration results, the application provides a multi-modal medical image registration model training method, a registration system and equipment.

In a first aspect, the present application provides a method for training a registration model of a multi-modal medical image, which adopts the following technical scheme:

a multi-modal medical image registration model training method, comprising:

preprocessing a medical data image sample to generate a reference image vector sequence, a moving image vector sequence, a 3D reference image vector sequence and a 3D moving image vector sequence, wherein the data image sample comprises a plurality of groups of paired reference images and moving images;

inputting the reference image vector sequence, the moving image vector sequence, the 3D reference image vector sequence and the 3D moving image vector sequence into the image registration network model to generate a deformation field;

calculating a network loss based on the deformation field and a loss function;

adjusting parameters of the image registration network model based on an error back propagation algorithm;

and repeatedly executing the processes of inputting the image sample, calculating the network loss and adjusting the model parameters until the loss function is no longer reduced, and completing training to obtain a trained image registration network model.

By adopting the technical scheme, the method and the device can be directly used for registering images of different modes and different sizes without rigid body registration treatment, and can search matching positions in a larger range, thereby avoiding receptive field limitation in CNN and improving the stability of registration results.

Optionally, the step of performing image preprocessing on the medical data image sample includes:

respectively carrying out isotropic resampling processing on the reference image and the moving image to generate an isotropic 3D reference image and a 3D moving image;

and respectively carrying out subblock division and position coding processing on the reference image, the moving image, the 3D reference image and the 3D moving image to generate a reference image vector sequence, a moving image vector sequence, a 3D reference image vector sequence and a 3D moving image vector sequence.

Optionally, the inputting the reference image vector sequence, the moving image vector sequence, the 3D reference image vector sequence, and the 3D moving image vector sequence into the image registration network model, and generating the deformation field includes:

the reference image vector sequence is used as a query vector input in a cross attention module, and the 3D reference image vector sequence is used as a key vector and a value vector input in the cross attention module;

Calculating a first level reference image feature vector sequence of an encoder cross-attention module based on the reference image vector sequence and an input of the 3D reference image vector sequence;

restoring the reference image vector sequence and the 3D reference image vector sequence of the N-1 level into a restored medical data image, and performing downsampling operation on the restored medical data image to generate a reference image number sequence of the N level and a 3D reference image vector sequence of the N level, wherein N=2, … …, N;

the N-level reference image quantity sequence is used as a query vector input in a cross attention module, and the N-level 3D reference image vector sequence is used as a key vector and a value vector input in the cross attention module;

calculating an nth level reference image feature vector sequence of an encoder cross attention module based on the nth level reference image vector sequence and an input of the nth level 3D reference image vector sequence;

the moving image vector sequence is input as a query vector in a cross attention module, and the 3D moving image vector sequence is input as a key vector and a value vector in the cross attention module;

Calculating a first-level moving image feature vector sequence of an encoder cross-attention module based on the moving image vector sequence and an input of the 3D moving image vector sequence;

restoring the N-1 level moving image vector sequence and the 3D moving image vector sequence into restored medical data images, and performing downsampling operation on the restored medical data images to generate an N level moving image number sequence and an N level 3D moving image vector sequence, wherein N=2, … …, N;

the N-level moving image quantity sequence is used as a query vector input in a cross attention module, and the N-level 3D moving image vector sequence is used as a key vector and a value vector input in the cross attention module;

calculating an nth level moving image feature vector sequence of an encoder cross attention module based on the nth level moving image vector sequence and an input of the nth level 3D moving image vector sequence;

and inputting the reference image characteristic vector sequences of all levels and the moving image characteristic vector sequences of all levels into a decoder cross attention module for decoding, and obtaining a deformation field output by the decoder.

Optionally, the inputting the reference image feature vector sequence and the moving image feature vector sequence into a decoder cross-attention module for decoding includes:

the method comprises the steps of inputting a reference image feature vector sequence of each level as a query vector in a current level cross attention module, inputting a moving image feature vector sequence of each level as a key vector and a value vector in the current level cross attention module, and generating a plurality of levels of output results;

restoring the output results of the multiple levels into the restored medical data image, and performing connection operation on the current level of the restored medical data image after upsampling and the previous level of the restored medical data image;

and carrying out convolution operation and GELU activation on all the linked restored medical data images, and carrying out forward propagation and superposition normalization processing.

Optionally, the calculating the network loss based on the deformation field and the loss function includes:

acquiring a plurality of medical deformation images, wherein the medical deformation images are images obtained by performing amplification processing on the reference image and/or the moving image;

generating a training sample based on the medical deformation image and the medical data image sample;

Performing amplification processing on the reference image and/or the moving image to generate a plurality of medical deformation images;

the training samples are sent into the image registration network model, and registration is carried out two by two to obtain a plurality of deformation fields;

determining a composite self-constraining loss and a registration loss based on the plurality of deformation fields, wherein the registration loss comprises a continuity loss, a similarity loss, and a contour overlap loss;

the loss function is derived based on the composite self-constraining loss and the registration loss.

Optionally, the deriving the loss function based on the composite self-constrained loss and the registration loss includes: inputting the composite self-constrained loss and the registration loss into a loss function formula of the loss function;

the loss function formula is:

Loss＝Loss _Deformation +αLoss _Similarity +βLoss _Contour +γLoss _Continuous ；

wherein α, β and γ are specified superparameters for controlling the weight of the different losses to the overall Loss, loss _Deformation For the deformation field composite self-constraint condition, loss _Similarity For similarity Loss, loss _Contour For contour constraints, loss _Continuous Is a deformation field continuity constraint condition.

In a second aspect, the present application provides a multi-modality medical image registration method, which adopts the following technical scheme:

A method of multi-modality medical image registration, comprising:

performing subblock division and position coding processing on the reference image, the moving image, the 3D reference image and the 3D moving image respectively to generate a reference image vector sequence, a moving image vector sequence, a 3D reference image vector sequence and a 3D moving image vector sequence;

inputting the reference image vector sequence, the moving image vector sequence, the 3D reference image vector sequence and the 3D moving image vector sequence image to the image registration network model established by the multi-mode medical image registration model training method of the first aspect, and generating a registered moving image.

In a third aspect, the present application provides a multi-modality medical image registration system, which adopts the following technical scheme:

A multi-modality medical image registration system, comprising:

the isotropic acquisition module is used for respectively carrying out isotropic resampling processing on the reference image and the moving image to generate an isotropic 3D reference image and a 3D moving image;

the subblock dividing and position encoding module is used for respectively carrying out subblock dividing and position encoding processing on the reference image, the moving image, the 3D reference image and the 3D moving image to generate a reference image vector sequence, a moving image vector sequence, a 3D reference image vector sequence and a 3D moving image vector sequence;

an image registration network model for generating a deformation field from the reference image vector sequence, the moving image vector sequence, the 3D reference image vector sequence, and the 3D moving image vector sequence;

the deformation registration module is used for registering the moving image according to the deformation field and generating a registered moving image; and the training module is used for training the image registration network model.

In a fourth aspect, the present application provides an electronic device, which adopts the following technical scheme:

an electronic device comprising a memory and a processor, the memory having stored thereon a computer program capable of being loaded by the processor and performing the multimodal medical image registration model training method of any of the first aspects.

In a fifth aspect, the present application provides a computer readable storage device, which adopts the following technical scheme:

a computer readable storage device storing a computer program capable of being loaded by a processor and executing the multimodal medical image registration model training method of any of the first aspects.

Drawings

Fig. 1 is a schematic flow chart of a multi-modal medical image registration model training method according to an embodiment of the present application.

Fig. 2 is a block diagram of a multi-modal medical image registration model training method according to an embodiment of the present application.

Fig. 3 is a block diagram of a reference image encoding process provided in an embodiment of the present application.

Fig. 4 is a block diagram of a moving image encoding process provided in an embodiment of the present application.

Fig. 5 is a block diagram of a decoding process provided in an embodiment of the present application.

Fig. 6 is a block diagram of a parallel cross-attention module provided in an embodiment of the present application.

Fig. 7 is a block diagram of a parallel cross-attention module calculation process provided in an embodiment of the present application.

Fig. 8 is a schematic diagram of the overall delineated areas and delineated contours provided in the embodiments of the present application.

Fig. 9 is a flowchart of a multi-modality medical image registration method according to an embodiment of the present application.

Fig. 10 is a schematic flow chart of a multi-modality medical image registration system according to an embodiment of the present application.

Fig. 11 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The present application is described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a multi-modal medical image registration model training method provided by an embodiment of the present application, and fig. 2 is a structural block diagram of a multi-modal medical image registration model training method provided by an embodiment of the present application.

As shown in fig. 1 and 2, the main flow of the method is described as follows (steps S101 to S105):

step S101, preprocessing a medical data image sample to generate a reference image vector sequence, a moving image vector sequence, a 3D reference image vector sequence and a 3D moving image vector sequence, wherein the data image sample comprises a plurality of groups of paired reference images and moving images;

In the present embodiment, the types of medical data image samples include, but are not limited to, CT images, MR images, and PET images. The reference image and the moving image in each group are images of different modes, for example, the reference image is a CT image, the moving image is an MR image, or the reference image is a CT image, the moving image is a PET image, etc.

It should be noted that, in the medical data image sample, each pair of reference image and moving image is two medical data images of the same analysis site of the same patient, and in this embodiment, when image registration is performed on two medical data images of different modes, multiple sets of reference images and moving images, for example, multiple sets of CT images and MR images of different sites, exist in the medical data image sample.

Respectively carrying out isotropic resampling processing on the reference image and the moving image to generate an isotropic 3D reference image and a 3D moving image aiming at the step S101; and respectively carrying out subblock division and position coding on the reference image, the moving image, the 3D reference image and the 3D moving image to generate a reference image vector sequence, a moving image vector sequence, a 3D reference image vector sequence and a 3D moving image vector sequence.

In this embodiment, for different modality medical data images acquired by the same patient at different times using different medical devices, there are differences in the examined body range, resolution and size of the presentation of the medical data images, except for the obvious differences in contrast between the different modality medical data images. Before registering the medical data image samples, resampling the reference image and the moving image to be processed along the same patient coordinate system direction according to a uniform sampling interval to generate an isotropic 3D reference image and a 3D moving image. The moving image and the reference image to be processed are usually 2D images with equal intervals, and the moving image and the reference image are sampled at equal intervals in all directions to obtain volume data composed of stereoscopic blocks with the same size.

In this embodiment, when performing sub-block division, the transform process is a vector sequence composed of a set of vectors with length C, and the output is also a vector sequence with the same number and length. A typical Transformer based registration network is to divide the image into p x p adjacent but non-overlapping sub-blocks, and to generate a vector from the image information in each sub-block in a specified manner. Since the whole graph is divided into

Sub-blocks, then generate +.>

And a vector. The vector is linearly transformed, and the superimposed position is encoded and then used as the input of a multi-head attention mechanism of a transducer. Because the matching position of each pixel on the original input reference image on the moving image is only concerned in the registration process, and the position of each pixel point on the 3D reference image of the isotropic image is not required to be acquired, the Query vector Query, the Key vector Key and the Value vector Value in the registration network model of the input image are formed by different vector sequences.

Since the vector sequence for providing the query vector determines the number of vectors output by the attention mechanism, and the registration only concerns the matching position of each pixel on the original reference image, the proposal samples sub-blocks of sub-images with the size p multiplied by p, which are obtained by sampling according to the same resolution, around each pixel position of the original input image, wherein p is usually set as 5, and the sub-images are subjected to linear transformation to obtain medical data image information vectors with the designated length C; calculating the corresponding position code of the sub-block, wherein the code length is 3; and combining the modes of the position coding pre-load image information vectors to generate vectors corresponding to the subblocks with the length of C+3, thereby completing the subblock division and position coding processing corresponding to the coded network query vector. If the original input image is of the size W×H×N, the vector sequence for generating the query vector contains W×H×N vectors. Because the sub-block cannot be generated by taking the pixel position on the image of the top layer and the bottom layer as the center, the pixel position on the layer is set to be the top end or the bottom end of the sub-block, the generation of the p multiplied by p sub-block with the same resolution is completed, and the pixels on the image are taken as the center in the middle part, so that the p multiplied by p sub-block is generated.

The vector sequence for providing the key vector and the value vector is generated from the resampled isotropic volume data at the same resolution, dividing the resampled isotropic data into adjacent sub-blocks without overlapping portions, if isotropicData size W _iso ×H _iso ×N _iso It can be divided into

A sub-graph of p x p, wherein the symbol +.>

Representing rounding down, symbol->

Representing an upward rounding. The resampled isotropic data has the possibility that it cannot be divided into an integer number of standard-sized sub-blocks, so that incomplete sub-blocks not containing the extreme edges of the region of interest can be truncated, so +.>

And->

Rounding down, when an incomplete sub-block contains a region of interest, it is necessary +.>

And (3) filling upwards, namely copying the nearest position information filling sub-blocks, completing the following linear transformation and position coding, and generating a vector sequence for providing a key vector and a value vector.

In the present embodiment, the reference image is denoted by F, the moving image is denoted by M, and the 3D reference image is denoted by F _iso Representation, M for 3D moving image _iso The sampling interval is represented by ps, and the size of isotropic data of the 3D reference image is W when the position encoding is performed _Fiso ×H _Fiso ×N _Fiso The center point position of the interested region is coded as (0, 0), the position of the upper left corner of each sub-block is deviated from the offset vector of the center point position, and W is calculated as the sum of the positions of the upper left corner and the center point _Fiso The length of x ps is regarded as unit 1, and the normalized offset vector is the position code. According to the rule, position coding setting of the topmost left upper corner sub-block of the corresponding 3D reference imageIs that

According to this rule, the sub-blocks of the vector sequence on the reference map that generated the query vector are position-coded, the position coding of the data center point of the overlay of the medical data image is set to (0, 0), the offset of the upper left corner point of the sub-block from the center point is calculated, and then the data is obtained according to W _Fiso The offset vector is normalized in such a manner that the x ps length is regarded as unit 1, i.e., the position code is obtained. Position coding of sub-blocks on 3D moving image data, calculating the position coding of the center point of the coverage data of the sub-blocks to be (0, 0), calculating the offset of the upper left corner point of the sub-blocks relative to the center point, and calculating the offset according to W _Fiso The offset vector is normalized in such a manner that the x ps length is regarded as unit 1, and the position code is obtained.

Step S102, inputting a reference image vector sequence, a moving image vector sequence, a 3D reference image vector sequence and a 3D moving image vector sequence into an image registration network model to generate a deformation field;

for step S102, the image registration network model includes an encoder and a decoder, generates deformation fields based on the encoder and the decoder, and inputs a reference image vector sequence as a query vector in the cross attention module and a 3D reference image vector sequence as a key vector and a value vector in the cross attention module when encoding; calculating a first level reference feature output vector sequence of the encoder cross attention module based on the reference image vector sequence and the input of the 3D reference image vector sequence; restoring the reference image vector sequence and the 3D reference image vector sequence of the N-1 level into a restored medical data image, and performing downsampling operation on the restored medical data image to generate a reference image number sequence of the N level and the 3D reference image vector sequence of the N level, wherein N=2, … … and N; the N-level reference image quantity sequence is used as a query vector in the cross attention module to be input, and the N-level 3D reference image vector sequence is used as a key vector and a value vector in the cross attention module to be input; calculating an nth level reference feature output vector sequence of the encoder cross attention module based on the nth level reference image vector sequence and the input of the nth level 3D reference image vector sequence; the moving image vector sequence is used as a query vector input in the cross attention module, and the 3D moving image vector sequence is used as a key vector and a value vector input in the cross attention module; calculating a first level movement feature output vector sequence of the encoder cross attention module based on the moving image vector sequence and the input of the 3D moving image vector sequence; restoring the N-1 level moving image vector sequence and the 3D moving image vector sequence into a restored medical data image, and performing downsampling operation on the restored medical data image to generate an N level moving image number sequence and an N level 3D moving image vector sequence, wherein N=2, … … and N; the N-level moving image quantity sequence is used as a query vector in the cross attention module to be input, and the N-level 3D moving image vector sequence is used as a key vector and a value vector in the cross attention module to be input; calculating an nth level moving feature output vector sequence of the encoder cross attention module based on the nth level moving image vector sequence and the input of the nth level 3D moving image vector sequence; and inputting the reference characteristic output vector sequences of all levels and the moving characteristic output vector sequences of all levels into a cross attention module of the decoder for decoding to obtain a deformation field output by the decoder.

In the present embodiment, since the reference image and the moving image are different in mode and size, the reference image and the moving image are encoded separately.

The reference image vector sequence provides a query vector sequence in a calculation cross attention module of the encoder, the 3D reference image vector sequence is used for providing a key vector and a value vector sequence in the attention module, and after at least one pass of parallel cross attention module processing, a first-level reference feature output vector sequence XF is obtained ₁ First-level reference feature output vector sequence XF ₁ The key vector and value vector sequences are provided for subsequent parallel cross-attention modules, and the sources of the query vector sequences provided in the same level are consistent. In order to obtain information of different scales, the vector sequence of the previous level of the current level needs to be restored medicineThe data image, i.e. the elements constituting the vector, are used as different features of the position to generate a reconstructed medical data image in the original arrangement. For example, the reference image sequence of the first level may be restored to the (c+3) ×w×h×n feature. The downsampling operation is performed by a 3D convolution of 3 x 3 of stride=2, obtaining

After the features of (2), vectorizing it to form +.>

Each vector having a length of (c+3), the sequence of vectors obtained by reference image feature downsampling provides a sequence of query vector vectors for the N-th level of parallel cross attention module, and the feature vectors of the 3D reference image feature downsampling are used to provide a sequence of key vectors and value vector vectors. Sequentially obtaining reference feature vector sequences XF with different scales ₁ 、XF ₂ 、……XF _n And finishing the coding process of the reference image.

In this embodiment, with N max being 3 as an example, referring to fig. 3, a reference image vector sequence provides a query vector sequence, a 3D reference image vector sequence provides a key vector, a value vector sequence is input to each parallel cross attention module, and after processing by at least one cross attention module, a first-level reference feature output vector sequence XF is generated ₁ Restoring the reference image vector sequence and the 3D reference image vector sequence to second-level restored medical data images respectively in the first-level reference feature output vector sequence, performing downsampling operation on the second-level restored medical data images through 3X 3D convolution of stride=2, generating a second-level reference image vector sequence and a second-level 3D reference image vector sequence, wherein the second-level reference image vector sequence also provides a query vector sequence, the second-level 3D reference image vector sequence also provides a key vector and a value vector sequence, and inputting the query vector, the key vector and the value vector of the second level into each parallel cross attention module in the second level to perform at least one cross attention After the module processing, a second-level reference characteristic output vector sequence XF is generated ₂ The method comprises the steps of performing downsampling operation on a second-level 3 x 3D convolution with stride=2 to restore a restored medical data image directly to a third-level restored medical data image, performing downsampling operation on the third-level restored medical data image with stride=2 to generate a third-level reference image vector sequence and a third-level 3D reference image vector sequence, wherein the third-level reference image vector sequence also provides a query vector sequence, the third-level 3D reference image vector sequence also provides a key vector and a value vector sequence, inputting the query vector, the key vector and the value vector of the third level into each parallel cross attention module in the third level, processing the data by at least one cross attention module, and generating a third-level reference feature output vector sequence XF ₃ 。

It should be noted that, the number of stages of the reference feature output vector sequence needs to be set according to actual requirements, and may be set to one stage or multiple stages, and when the reference feature output vector sequence is set to one stage, the reference image vector sequence and the 3D reference image vector sequence do not need to be restored to the restored medical data image.

Similarly, the moving image encoding process is consistent with the reference image encoding process, the moving image vector sequence provides a query vector sequence in a calculation cross attention module of the encoder, the 3D moving image vector sequence is used for providing a key vector and a value vector sequence in the attention module, and after at least one pass of processing of the parallel cross attention module, a first-level moving characteristic output vector sequence XF is obtained ₁ First-level moving feature output vector sequence XF ₁ The key vector and value vector sequences are provided for subsequent parallel cross-attention modules, and the sources of the query vector sequences provided in the same level are consistent. In order to obtain information of different scales, it is necessary to restore the vector sequence of the previous level of the current level to a restored medical data image, that is, elements constituting the vector are used as different features of the position, and the restored medical data image is generated according to the original arrangement mode. For example, the moving image sequence of the first level may be restored to (C+3) ×W×h×n features. The downsampling operation is performed by a 3D convolution of 3 x 3 of stride=2, obtaining

After the features of (2), vectorize it to form

Each vector having a length of (c+3), the sequence of vectors obtained by downsampling the moving image features provides a sequence of query vector vectors for the N-th level of parallel cross attention module, and the downsampled feature vectors of the 3D moving image features are used to provide a sequence of key vectors and value vector vectors. Sequentially obtaining moving feature vector sequences XM with different scales ₁ 、XM ₂ 、……XM _n And finishing the coding process of the moving image.

In this embodiment, with N max being 3 as an example, referring to fig. 4, a moving image vector sequence provides a query vector sequence, a 3D moving image vector sequence provides a key vector, a value vector sequence is input to each parallel cross attention module, and after processing by at least one cross attention module, a first level moving feature output vector sequence XM is generated ₁ Restoring the moving image vector sequence and the 3D moving image vector sequence to second-level restored medical data images respectively in the first-level moving characteristic output vector sequence, performing downsampling operation on the second-level restored medical data images through 3D convolution of 3 x 3 of stride=2, generating a second-level moving image vector sequence and a second-level 3D moving image vector sequence, wherein the second-level moving image vector sequence also provides a query vector sequence, the second-level 3D moving image vector sequence also provides a key vector and a value vector sequence, inputting the query vector, the key vector and the value vector of the second level into each parallel cross attention module in the second level, processing the second-level restored medical data images through at least one cross attention module, and generating a second-level moving characteristic output vector sequence XM ₂ Downsampling the second level 3D convolution of 3 x 3 with stride=2 to recover the medical data image directly intoA third-level restored medical data image, wherein the third-level restored medical data image is subjected to downsampling operation through 3D convolution of 3 x 3 of stride=2 to generate a third-level moving image vector sequence and a third-level 3D moving image vector sequence, the third-level moving image vector sequence also provides a query vector sequence, the third-level 3D moving image vector sequence also provides a key vector and value vector sequence, the query vector, the key vector and the value vector of the third level are input into each parallel cross attention module in the third level, and after being processed by at least one cross attention module, a third-level moving characteristic output vector sequence XM is generated ₃ 。

It should be noted that, the number of stages of the motion feature output vector sequence needs to be set according to actual requirements, and may be set to one stage or multiple stages, and when the number of stages is set to one stage, it is not necessary to restore the motion image vector sequence and the 3D motion image vector sequence to the restored medical data image.

It should be noted that, because the sizes of the reference image and the moving image are different, the number of vectors included in the reference feature vector sequence and the moving image feature vector sequence at the same level may be the same or different, but the length of each vector is identical to (c+3).

In this embodiment, when decoding is performed, a reference feature output vector sequence of each level is input as a query vector in the current level cross attention module, and a moving feature output vector sequence of each level is input as a key vector and a value vector in the current level cross attention module, so as to generate output results of multiple levels; restoring the output results of the multiple levels into a restored medical data image, and performing connection operation on the restored medical data image of the current level and the restored medical data image of the previous level after upsampling the restored medical data image of the current level; and carrying out convolution operation and GELU activation on all the linked restored medical data images, and carrying out forward propagation and superposition normalization processing.

The transducer in the decoder performs attention weighting according to the matching degree between the reference image feature vector and the moving feature vector, so as to gradually obtain the parametersAnd (5) checking information of the matching position of the image. The sequence of reference feature vectors at each level provides a query vector for the subsequent parallel cross-attention module, while the sequence of moving feature vectors is responsible for providing a key vector and a value vector. After processing of at least one pass of parallel cross attention module, the output result of each level of parallel cross attention module needs to be restored to a restored medical data image, in order to fuse information acquired by different scales, the restored medical data image of the lowest level is up-sampled and then is connected with the restored medical data image of the previous level, after the convolution operation of 3 x 3 and the GELU activation, the restoration medical data image of the previous level is continued to be subjected to the convolution operation of 3 x 3 and the GELU activation until all levels of restored medical data images are linked, and the decoding process is completed. Will W due to the position encoding process _Fiso X ps is normalized as 1, so the 3D deformation field result provided by the decoding network is a normalized deformation field, which needs to be multiplied by W _Fiso The x ps generates a true deformation field.

In the present embodiment, three examples are given at the above level, and referring to fig. 5, the third level refers to the feature vector sequence XF ₃ Providing a query vector sequence, a third level shift feature vector sequence XM ₃ Providing a key vector and a sequence of value vector quantities, restoring to a third-level restored medical data image after processing by at least one parallel cross attention module, performing connection operation on the third-level restored medical data image up-sampling and the second-level restored medical data image, after the convolution operation of 3 x 3 and the GELU activation, the method continues to be connected with the restored medical data image of the first level, and after the convolution operation of 3 x 3 and the GELU activation are performed again, the method performs the normalization processing on the intentionally divided deformation field to generate a real deformation field after the convolution operation of 1 x 1 and the GELU activation are performed again.

When decoding processing is performed, up-sampling processing is performed layer by layer from the lowest level to the upper level, when only the first level is provided, up-sampling processing is not required, and the restored medical data image of the first level is directly subjected to convolution operation of 3 x 3 and GELU activation, and then subjected to convolution operation of 1 x 1 and GELU activation, and then subjected to normalization processing by deliberately dividing the deformation field to generate a real deformation field.

In this embodiment, since the number of vectors providing the query vector sequence is not identical to the number of vector sequences providing the key vector and the value vector, the original multi-headed attention mechanism is modified.

As shown in figure 6, a plurality of basic cross attention modules are adopted for parallel processing to obtain different features, after feature vectors of all the cross attention modules are combined, the scale of the feature vectors is identical to (C+3) in a forward propagation mode, and then the output vectors of the cross attention modules are obtained through forward propagation and superposition normalization processing.

As shown in fig. 7, the input vector sequence providing the Query vector Query sequence is set to X, and the input vector sequence providing the Key vector Key and the Value vector Value is set to Y. First, a linear transformation matrix W to be learned ^q Acting on all vectors in sequence X to obtain a query vector sequence:

q ¹ ＝W ^q ×x ₁ ，q ² ＝W ^q ×x ₂ ，……，q ^Nx ＝W ^q ×x _Nx ；

then the linear transformation matrix W ^k Acting on all vectors in sequence Y, obtaining a key vector sequence:

k ¹ ＝W ^k ×y ₁ ，k ² ＝W ^k ×y ₂ ，……，k ^Ny ＝W ^k ×y _Ny ；

then transform matrix W ^v Acting on all vectors in sequence Y, a sequence of value vectors is obtained:

v ¹ ＝W ^v ×y ₁ ，v ² ＝W ^v ×y ₂ ，……，v ^Ny ＝W ^v ×y _Ny ；

then, the dot product result of the ith query vector and all key vectors is calculated, and the parameters related to the attention are obtained:

α _i，j ＝q ⁱ ×k ^j ，i＝1，2，……，Nx，j＝1，2，……，Ny；

The attention score was normalized using Softmax to obtain the attention score:

finally, the attention score is used as weight, the average value vector sequence is weighted, and the output vector sequence is obtained:

the number of vectors contained in the output vector sequence is the same as that of vectors contained in the input vector sequence, and the lengths of the vectors are (C+3).

Step S103, calculating network loss based on the deformation field and the loss function;

step S104, adjusting parameters of an image registration network model based on an error back propagation algorithm;

for step S103, a plurality of medical deformation images are acquired, wherein the medical deformation images are images obtained by performing amplification processing on the reference image and/or the moving image; generating a training sample based on the medical deformation image and the medical data image sample; amplifying the reference image and/or the moving image to generate a plurality of medical deformation images; the training samples are sent into an image registration network model, and registration is carried out two by two to obtain a plurality of deformation fields; determining a composite self-constraining loss and a registration loss based on the plurality of deformation fields, wherein the registration loss comprises a continuity loss, a similarity loss, and a contour coincidence loss; a loss function is derived based on the composite self-constraining loss and the registration loss.

When the image registration network model is trained, only one group of reference images and moving images are loaded at a time, but each group of reference images and moving images are processed to obtain a plurality of medical data images, and the medical data images are formed into a group of training samples.

In this embodiment, only one of the reference image and the moving image may be subjected to the amplification process, or both of the reference image and the moving image may be subjected to the amplification process at the same time, and when the amplification process is performed, only one amplification process may be performed on the reference image and/or the moving image, or a plurality of amplification processes may be performed on the reference image and/or the moving image, which is not particularly limited.

The amplification processing comprises random rotation translation and stretching operations on a reference image or a moving image, so that a plurality of medical data images are obtained, the diversity of image data in each group of training samples is further increased, the image registration network model is ensured to obtain characteristics closely related to stable registration of rotation, translation and stretching invariance, the training optimization process of the image registration network model is guided, and the image registration network model is ensured to process large deformation and stretching conditions.

The reference image and the moving image are explained below:

when registering image a to image B, then image a is a moving image and image B is a reference image; when registering image B to image a, then the B image is a moving image and the a image is a reference image.

In this embodiment, the strain field will be described for the case where the training sample includes one amplified image and a plurality of amplified images, respectively.

The training sample comprises an amplified image

When the training sample comprises an amplified image, combining the medical data images in pairs to form a first image group and a second image group, wherein the second image group comprises an amplified image corresponding to a reference image and a moving image group or an amplified image corresponding to a moving image and a reference image group; inputting the first image group and the second image group into an image registration network model to generate a deformation field corresponding to the first image group and a deformation field corresponding to the second image group;

in the present embodiment, the amplified image is exemplified by the amplified image corresponding to the moving image group, and the training sample includes a reference image, a moving image, and the amplified image corresponding to the moving image group A for amplifying image of (2) ₁ (M) means that there is a first image group { M, F } and a second image group { A } in the training sample ₁ (M), F) inputting the first image set into the image registration network to obtain a deformation field phi ₁ Inputting the second image group into the image registration network model to obtain a deformation field phi ₂ 。

(II) the training sample contains multiple amplified images

When the training sample comprises a plurality of amplified images, combining all medical data images in the training sample in pairs to form a first image group and a plurality of second image groups, wherein each second image group comprises a reference image and a moving image, and at least one of the reference image and the moving image in each second image group is an amplified image; the first image group and the plurality of second image groups are input into an image registration network model, and a deformation field corresponding to the first image group and a deformation field corresponding to each second image group are generated.

In this embodiment, two amplification images are taken as an example, wherein one amplification image is an amplification image corresponding to a reference image group, and the other amplification image is an amplification image corresponding to a moving image group, and the training sample comprises one reference image, one moving image and two amplification images, wherein the reference image is represented by F, the moving image is represented by M, and the amplification image corresponding to the moving image group is represented by A ₁ (M) represents that the amplified image corresponding to the reference image group is A ₂ F, there is a first image group { M, F } in the training sample, three second image groups { A } ₁ (M)，F}、{M，A ₂ (F) Sum { A } ₁ (M)，A ₂ (F) Inputting the first image group into an image registration network to obtain a deformation field phi ₁ Respectively inputting the second image group into an image registration network model to obtain three deformation fields phi ₂ 、φ ₃ And phi ₄ 。

It should be noted that the number of the second image groups in the plurality of training samples may be the same or may be different, which is not limited in particular.

Step S104, calculating deformation field composite self-constraint conditions and registration loss in each training sample based on a plurality of deformation fields in each training sample, wherein the registration loss comprises similarity loss, constraint conditions of deformation field continuity and contour constraint conditions; the following description is directed to calculating deformation field composite self-constraint conditions, constraint conditions of similarity loss and deformation field continuity and contour constraint conditions of each group of training samples respectively:

regarding the deformation field composite self-constraint condition, the case where the training sample includes one second image group and a plurality of second image groups is described in this embodiment.

The training sample comprises a second image group

When the training samples comprise a second image group, applying deformation fields in a first image group in each training sample to the moving images in the first image group to generate registered moving images; matching the registered moving image with a reference image in the first image group to obtain a first registration equation; applying a deformation field corresponding to the second image group to the moving images of the second image group to generate registered moving images; matching the registered moving image with a reference image in a second image group to obtain a second registration equation corresponding to the second image group; combining each second registration equation with the first registration equation to obtain a relationship of deformation fields between the second image group and the first image group; determining self-constraint conditions of each group of training samples based on the relation of deformation fields between the second image group and the first image group, and taking the self-constraint conditions as deformation field composite self-constraint conditions Loss _Deformation 。

In the present embodiment, for example, the deformation field corresponding to the first image group is φ ₁ The reference image is F, the moving image is M, and the deformation field phi is formed ₁ Acting on the moving image M to generate a registered moving image phi ₁ (M) the first registration equation is phi ₁ (M)＝F。

The deformation field corresponding to the second image group is phi ₂ The moving image in the second image group is an amplified image A ₁ (M) reference image F, deformation field phi ₂ Acting on moving image A ₁ (M) generating registered moving image as phi ₂ (A ₁ (M)) and generating a second registration equation of phi ₂ (A ₁ (M))＝F。

Processing the first registration equation and the second registration equation to obtain a deformation field relationship between the second image group and the first image group as phi ₂ (A ₁ )＝φ ₁ Determining the self-constraint condition of the training sample of the group as II phi based on the relation of deformation field between the second image group and the first image group ₂ (A ₁ )-φ ₁ ‖ ² Because the training samples of the group have only one second image group, the deformation field of the training samples of the group is compounded with the self-constraint condition Loss _Deformation Is II phi ₂ (A ₁ )-φ ₁ ‖ ² 。

(II) the training sample comprises a plurality of second image groups

When the training samples comprise a plurality of second image groups, the deformation field in the first image group in each group of training samples acts on the moving images in the first image group to generate registered moving images; matching the registered moving image with a reference image in the first image group to obtain a first registration equation; applying a deformation field corresponding to each second image group to the moving image of each second image group to generate a registered moving image; matching each registered moving image with a reference image in each second image group to obtain a second registration equation corresponding to the second image group; combining each second registration equation with the first registration equation to obtain a deformation field relation between each second image group and the first image group; determining a plurality of self-constraint conditions based on the relation of deformation fields between each second image group and each first image group, and superposing the plurality of self-constraint conditions to obtain a total self-constraint condition; taking the total self-constraint condition as a deformation field composite self-constraint condition Loss _Deformation 。

Deformation field phi ₂ The corresponding reference image is F, and the moving image is A ₁ (M) field of deformation ₂ Acting on moving image A ₁ (M) generating a registered moving image of phi ₂ (A ₁ (M)), the first registration equation is phi ₂ (A ₁ (M)) =f, and processing the first registration equation and the second registration equation to obtain a relationship of deformation field between the second image group and the first image group as phi ₂ (A ₁ )＝φ ₁ Determining the self-constraint condition of the training sample of the group as II phi based on the relation of deformation field between the second image group and the first image group ₂ (A ₁ )-φ ₁ ‖ ² Because the training samples of the group have only one second image group, the deformation field of the training samples of the group is compounded with the self-constraint condition Loss _Deformation Is II phi ₂ (A ₁ )-φ ₁ ‖ ² 。

Deformation field phi ₃ The corresponding reference image is A ₂ F, moving the image to M, and obtaining a deformation field phi ₃ Acting on the moving image M to generate a registered moving image phi ₃ (M) the second registration equation is phi ₃ (M)＝A ₂ (F) Processing the first registration equation and the second registration equation to obtain a deformation field relation between the second image group and the first image group as A ₂ (φ ₁ (M))＝φ ₃ (M) determining the self-constraint condition of the training samples of the group as II A based on the relation of the deformation field between the second image group and the first image group ₂ (φ ₁ )-φ ₃ ‖ ² 。

Deformation field phi ₄ The corresponding reference image is A ₂ F, moving image is A ₁ (M) field of deformation ₄ Acting on moving image A ₁ (M) generating a registered moving image of phi ₄ (A ₁ (M)) a second registration equation of phi ₄ (A ₁ (M))＝A ₂ (F) Processing the first registration equation and the second registration equation to obtain a deformation field relation between the second image group and the first image group as A ₂ (φ ₁ (M))＝A ₂ (φ ₁ (M)) based on the secondThe relation of deformation field between the image group and the first image group determines the self-constraint condition of the training sample of the group to be II phi ₄ (A ₁ )-A ₂ (φ ₁ )‖ ² 。

Adding all self-constraint conditions to obtain a composite self-constraint condition Loss of the training samples _Deformation Is II phi ₂ (A ₁ )-φ ₁ ‖ ² +‖A ₂ (φ ₁ )-φ ₃ ‖ ² +‖φ ₄ (A ₁ )-A ₂ (φ ₁ )‖ ² 。

Specifically, when the training samples include a plurality of second image groups, calculating the self-constraint condition of each pair of second image groups according to the method, and adding the self-constraint condition of the first image group and the self-constraint condition of all the second image groups to obtain a composite self-constraint condition Loss of the training samples of the group _Deformation 。

With respect to contour constraint Loss _Contour The calculation of (2) is as follows:

calculating DSC coefficients based on the mask of the reference image delineating region in the first image group and the mask of the registered moving image delineating region; determining a first contour constraint Loss based on a Dice coefficient _Contour 1, a step of; calculating DSC coefficients based on the masks of the reference image delineating areas in each second image group and the masks of the registered moving image delineating areas; determining a second contour constraint Loss based on the Dice coefficient _Contour 2; constraint of Loss on first contour _Contour 1 and all second contour constraints Loss _Contour 2, summing to obtain a total contour constraint; taking the total contour constraint as a contour constraint condition Loss _Contour ；Loss _Contour In a sketching area with larger deformation, the calculation of the Dice loss of the contours before and after the deformation is adopted, and in a relatively stable area, the calculation of the Dice loss of the sketching area is adopted, and the superposition result of the two is also adopted as the structural constraint loss.

Based on labeling information of certain organs or tissues, the contour loss based on the whole sketched region refers to calculating DSC (position coefficient) by using the mask of the region sketched by the reference image as shown in figure a and the mask of the sketched region of the deformed moving image, and the corresponding structural loss is 1-DSC. And the diagram b is the information of the boundary mask of the image sketching area, and is obtained by performing non-operation on a result mask1 obtained by outwards expanding the sketching area mask by 3mm and a mask2 obtained by inwards shrinking the sketching area by 3 mm. And calculating DSC of the moving image sketching region boundary mask by using the reference image sketching region boundary mask and the deformed moving image sketching region boundary mask information, wherein the corresponding structural loss is 1-DSC.

DSC (Dice similarity coefficient) is also called a Dice coefficient and is used to measure the degree of coincidence DSC (a, B) of two binarized maps a, B. In this embodiment, a represents a binary image of a label of a certain organ at risk on a reference image, a part of the organ at risk is labeled 1, and the other parts are labeled 0; and B, marking the same organs at risk by using entropy of the moving image, and marking the organs at risk as 1 by using a binary image after deformation by using a deformation field, wherein the other parts are marked as 0, and the mask is the binary image of the reference image or the moving image sketching part.

DSC has the following calculation formula:

the proportion of the correct segment is made, and the better the segment is, the larger the DSC is.

If it is taken as Loss, loss is required _Contour ＝1-DSC(A,B)；

In order to emphasize the ability to follow changes to the outline of the marked region, on the basis of the calculated Dice Loss using organ-at-risk marking, the outline Dice component can also be introduced:

Border(A)＝Dilate(A，3)∩(～Dilate(A，-3))

the method comprises the steps of taking a loudspeaker (A) as an example, firstly expanding the outline A by 3mm to obtain a Dilate (A, 3), then shrinking the outline by 3mm to obtain a Dilate (A, -3), then negating the Dilate (A, -3) (Dilate (A, -3)), and performing AND operation on the result of the inverse expansion and the inverse shrinkage to obtain outline mask information for marking organs at risk, wherein the outline constraint loss can be calculated according to a formula of the loudspeaker (A) =dilate (A, 3)/(Dilate (A, -3)).

Note that the contour constraint condition Loss of each training sample set _Contour Is obtained by adding the contour constraints of the first image set and all the second image sets.

Constraint Loss on deformation field continuity _Continuous The calculation of (2) is as follows:

calculating individual pixel positions based on deformation fields of the first image group

Is->

Is a first jacobian of (c); determining a first deformation field continuity constraint Loss based on the first jacobian _Continuous 1, a step of; calculating the individual pixel positions +/based on the deformation field of each second image group>

Is->

A second jacobian of (2); determining a second deformation field continuity constraint Loss based on the second jacobian _Continuous 2; for the first deformation field continuity constraint Loss _Continuous 1 and all second deformation field continuity constraints Loss _Continuous 2, summing to obtain a deformation field continuity constraint condition Loss _Continuous ；

Loss _Continuous The method is mainly used for restraining the deformation field of the sketched range point with a rigid structure, and the deformation field cannot be folded.

Calculating the position of each pixel using the deformation field phi

Distortion field at->

Jacobian matrix of (a), jacobian determinant:

if no folding of the deformation field occurs, it is necessary to let the Jacobian determinant values

Greater than 0, i.e.)>

To ensure that the deformation field does not fold, it is necessary to examine the value of Jacobian determinant for each pixel position, if this value is greater than zero, the determinant loss S

Minimum zero; loss S +.if the determinant value is equal to or less than zero>

The maximum value is 1, and the loss of the deformation field as a whole is the average value of the determinant losses of all points.

The calculation formula is as follows:

it should be noted that the deformation field continuity constraint condition Loss of each training sample set _Continuous Is obtained by adding deformation field continuity constraints of the first image set and all the second image sets.

Loss of similarity Loss _Similarity The calculation of (2) is as follows:

calculating first mutual information of the reference image and the moving image in the first image pair based on a mutual information calculation formula, and calculating a first similarity Loss based on the first mutual information _Similarity 1, a step of; calculating second mutual information of the reference image and the moving image in each second image pair based on the mutual information calculation formula, and calculating second similarity Loss based on the second mutual information _Similarity 2; loss of similarity to first Loss of similarity Loss _Similarity 1 and all second similarity losses Loss _Similarity 2 summing to obtain the overall similarity Loss _Similarity 。

Mutual information I (a, B) is a measure of the degree of correlation of two random variables and can also be used to calculate image similarity. Wherein A is a reference image, B is a moving image obtained after deformation by using a deformation field, and the mutual information calculation formula is as follows:

Loss _Similarity (A，B)＝2-I(A，B)；

since mutual information is distributed between [0, 2), the mutual information parameter is 2 in this embodiment, and the similarity is lost _Similarity Denoted 2-I (a, B) to satisfy the requirement that the smaller the similarity loss, the more similar the images.

Wherein H (A) represents entropy of the reference image, H (B) represents entropy of the moving image obtained after deformation of the deformation field, and the entropy is represented by unilateral probability density P of the image _A (a)，P _B (b) And (5) calculating to obtain the product. H (a, b) is cross entropy, defined by the joint probability density P of the two _AB (a, b) calculating to obtain a single-side probability density P _A (a) Is obtained by calculating a statistical histogram, P _AB (a, b) is a joint statistical histogramAnd (5) calculating. If continuous and conductive mutual information of the deformation field is required to be obtained, a Parzen window can be added when the statistical histogram is calculated, namely, the pixel value at the current position x is B (x), B (x) not only contributes to the fallen bin, but also contributes to adjacent bins, the specific contribution value can be determined by using an N-order B spline function, joint histogram calculation is carried out, and then the unilateral statistical histogram is calculated.

Note that the similarity Loss of each training sample group _Similarity Is obtained by adding the mutual information losses of the first image group and all the second image groups.

Specifically, the deformation field is compounded with self-constraint conditions and registration loss to input a loss function:

Loss＝Loss _Deformation +αLoss _Similarity +βLoss _Contour +γLoss _Continuous in each training sample group, wherein alpha, beta and gamma are specified super parameters for controlling the weight of different losses to the total Loss, loss _Deformation For the deformation field composite self-constraint condition, loss _Similarity For similarity Loss, loss _Contour For contour constraints, loss _Continuous Is a deformation field continuity constraint condition.

Step S105, repeatedly executing the processes of inputting the image sample, calculating the network loss and adjusting the model parameters until the loss function is no longer reduced to complete training, and obtaining a trained image registration network model.

In this embodiment, during training, a loss function is calculated for each registration after each registration is completed, and the loss function is modified by using an error back propagation algorithm until the loss function after the registration is completed is modified again by using back propagation and cannot be reduced, so that the training is stopped, and a trained image registration network model is obtained.

As shown in fig. 9, the main flow of the method is described as follows (steps S201 to S203):

step S201, respectively carrying out isotropic resampling processing on the reference image and the moving image to generate an isotropic 3D reference image and a 3D moving image;

in this embodiment, the preprocessing methods of the reference image and the moving image are the same as those in the above-mentioned multi-mode medical image registration model training method, and the reference image and the moving image are resampled along the same patient coordinate system direction according to a uniform sampling interval to generate an isotropic 3D reference image and a 3D moving image, and the reference image, the moving image, the 3D reference image and the 3D moving image are respectively subjected to sub-block division and position coding to generate a reference image vector sequence, a moving image vector sequence, a 3D reference image vector sequence and a 3D moving image vector sequence, and the specific image preprocessing method is not further described herein.

Step S202, sub-block division and position coding processing are respectively carried out on the reference image, the moving image, the 3D reference image and the 3D moving image, and a reference image vector sequence, a moving image vector sequence, a 3D reference image vector sequence and a 3D moving image vector sequence are generated.

Step S203, inputting the reference image vector sequence, the moving image vector sequence, the 3D reference image vector sequence, and the 3D moving image vector sequence image to the image registration network model established by the multi-modality medical image registration model training method of any one of claims 1 to 5, to generate a registered moving image.

In this embodiment, the reference image vector sequence, the moving image vector sequence, the 3D reference image vector sequence, and the 3D moving image vector sequence to be registered are sent into a trained image registration network model, and the trained image registration network model is used to process and output a registration image.

It should be noted that, the image types of the reference image and the moving image to be registered need to be consistent with the image types of the registration images supported by the image registration network model, for example, the registration images supported by the image registration network model are CT images and MR images, and then the reference image and the moving image to be registered need to be CT images and MR images, respectively.

As shown in fig. 10, the multi-modality medical image registration system includes: the system comprises an isotropy acquisition module, a subblock division and position coding module, an image registration network model and a deformation registration module. The isotropic acquisition module is used for respectively carrying out isotropic resampling processing on the reference image and the moving image; the sub-block dividing and position coding module is used for respectively carrying out sub-block dividing and position coding on the reference image, the moving image, the 3D reference image and the 3D moving image; the image registration network model is used for registering medical images to be registered and outputting registration images; and the deformation registration module is used for calculating a loss function and training the image registration network model by using the loss function.

Fig. 11 is a block diagram of an electronic device 300 according to an embodiment of the present application.

As shown in FIG. 11, electronic device 300 includes a processor 301 and memory 302, and may further include an information input/information output (I/O) interface 303, one or more of a communication component 304, and a communication bus 305.

Wherein the processor 301 is configured to control the overall operation of the electronic device 300 to complete all or part of the steps of the above-described multi-modality medical data image registration model building method; the memory 302 is used to store various types of data to support operation at the electronic device 300, which may include, for example, instructions for any application or method operating on the electronic device 300, as well as application-related data. The Memory 302 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as one or more of static random access Memory (Static Random Access Memory, SRAM), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk.

The I/O interface 303 provides an interface between the processor 301 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 304 is used for wired or wireless communication between the electronic device 300 and other devices. Wireless communication, such as Wi-Fi, bluetooth, near field communication (Near Field Communication, NFC for short), 2G, 3G or 4G, or a combination of one or more thereof, the corresponding communication component 104 may thus comprise: wi-Fi part, bluetooth part, NFC part.

The electronic device 300 may be implemented by one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), digital signal processors (Digital Signal Processor, abbreviated as DSP), digital signal processing devices (Digital Signal Processing Device, abbreviated as DSPD), programmable logic devices (Programmable Logic Device, abbreviated as PLD), field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), controllers, microcontrollers, microprocessors, or other electronic components for performing the multi-modal medical data image registration model building method as set forth in the above embodiments.

Communication bus 305 may include a pathway to transfer information between the aforementioned components. The communication bus 305 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus 305 may be divided into an address bus, a data bus, a control bus, and the like.

The electronic device 300 may include, but is not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), car terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like, and may also be a server, and the like.

The application also provides a computer readable storage device, wherein the computer readable storage device is stored with a computer program, and the computer program realizes the steps of the multi-mode medical data image registration model establishment method when being executed by a processor.

The computer readable storage device may include: a U-disk, a removable hard disk, a read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, or the like, which can store program codes.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the application referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or their equivalents is possible without departing from the spirit of the application. Such as the above-mentioned features and the technical features having similar functions (but not limited to) applied for in this application are replaced with each other.

Claims

1. A method for training a multi-modal medical image registration model, comprising:

Inputting the reference image vector sequence, the moving image vector sequence, the 3D reference image vector sequence and the 3D moving image vector sequence into an image registration network model to generate a deformation field;

calculating a network loss based on the deformation field and a loss function;

repeatedly executing the processes of inputting image samples, calculating network loss and adjusting model parameters until the loss function is no longer reduced, and completing training to obtain a trained image registration network model;

the step of preprocessing the medical data image samples comprises:

the inputting the reference image vector sequence, the moving image vector sequence, the 3D reference image vector sequence, and the 3D moving image vector sequence into the image registration network model, generating a deformation field comprises:

2. The method of claim 1, wherein inputting the full-level reference image feature vector sequence and the full-level moving image feature vector sequence into the decoder cross-attention module for decoding comprises:

3. The method of claim 2, wherein the calculating network losses based on the deformation field and a loss function comprises:

4. The method of claim 3, wherein the deriving the loss function based on the composite self-constraining loss and the registration loss comprises:

inputting the composite self-constrained loss and the registration loss into a loss function formula of the loss function;

The loss function formula is:

5. A method of multi-modality medical image registration, comprising:

inputting the reference image vector sequence, the moving image vector sequence, the 3D reference image vector sequence and the 3D moving image vector sequence image to an image registration network model established by the multi-modal medical image registration model training method of any one of claims 1 to 3, and generating a registered moving image.

6. A multi-modality medical image registration system, comprising:

the deformation registration module is used for registering the moving image according to the deformation field and generating a registered moving image;

the training module is used for training the image registration network model;

the isotropy acquisition module is specifically used for respectively carrying out isotropy resampling treatment on the reference image and the moving image to generate a isotropy 3D reference image and a 3D moving image; performing subblock division and position coding processing on the reference image, the moving image, the 3D reference image and the 3D moving image respectively to generate a reference image vector sequence, a moving image vector sequence, a 3D reference image vector sequence and a 3D moving image vector sequence;

The image registration network model is specifically used for inputting the reference image vector sequence as a query vector in a cross attention module, and inputting the 3D reference image vector sequence as a key vector and a value vector in the cross attention module; calculating a first level reference image feature vector sequence of an encoder cross-attention module based on the reference image vector sequence and an input of the 3D reference image vector sequence; restoring the reference image vector sequence and the 3D reference image vector sequence of the N-1 level into a restored medical data image, and performing downsampling operation on the restored medical data image to generate a reference image number sequence of the N level and a 3D reference image vector sequence of the N level, wherein N=2, … …, N; the N-level reference image quantity sequence is used as a query vector input in a cross attention module, and the N-level 3D reference image vector sequence is used as a key vector and a value vector input in the cross attention module; calculating an nth level reference image feature vector sequence of an encoder cross attention module based on the nth level reference image vector sequence and an input of the nth level 3D reference image vector sequence; the moving image vector sequence is input as a query vector in a cross attention module, and the 3D moving image vector sequence is input as a key vector and a value vector in the cross attention module; calculating a first-level moving image feature vector sequence of an encoder cross-attention module based on the moving image vector sequence and an input of the 3D moving image vector sequence; restoring the N-1 level moving image vector sequence and the 3D moving image vector sequence into restored medical data images, and performing downsampling operation on the restored medical data images to generate an N level moving image number sequence and an N level 3D moving image vector sequence, wherein N=2, … …, N; the N-level moving image quantity sequence is used as a query vector input in a cross attention module, and the N-level 3D moving image vector sequence is used as a key vector and a value vector input in the cross attention module; calculating an nth level moving image feature vector sequence of an encoder cross attention module based on the nth level moving image vector sequence and an input of the nth level 3D moving image vector sequence; and inputting the reference image characteristic vector sequences of all levels and the moving image characteristic vector sequences of all levels into a decoder cross attention module for decoding, and obtaining a deformation field output by the decoder.

7. An electronic device comprising a processor coupled to a memory;

the processor is configured to execute a computer program stored in the memory to cause the electronic device to perform the method of any one of claims 1 to 4.

8. A computer readable storage device comprising a computer program or instructions which, when run on a computer, cause the computer to perform the method of any of claims 1 to 4.