CN115082293A - Image registration method based on Swin transducer and CNN double-branch coupling - Google Patents

Image registration method based on Swin transducer and CNN double-branch coupling Download PDF

Info

Publication number
CN115082293A
CN115082293A CN202210650873.1A CN202210650873A CN115082293A CN 115082293 A CN115082293 A CN 115082293A CN 202210650873 A CN202210650873 A CN 202210650873A CN 115082293 A CN115082293 A CN 115082293A
Authority
CN
China
Prior art keywords
image
swin
cnn
branch
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210650873.1A
Other languages
Chinese (zh)
Inventor
李敏
范盼
王梦文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202210650873.1A priority Critical patent/CN115082293A/en
Publication of CN115082293A publication Critical patent/CN115082293A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • G06T3/147
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • G06T3/18
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4038Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks

Abstract

The invention discloses an image registration method based on double branch coupling of Swin transducer and CNN. The method comprises the following steps: 1. performing standard preprocessing steps such as gray value normalization, center clipping, resampling and the like on all images in the original data; 2. splicing the floating image and the fixed image, sending the spliced floating image and fixed image into a registration network, and parallelly passing through a Swin transform and a CNN two encoder branches; 3. in each stage of the Swin transducer, performing characteristic interaction and fusion on the Swin transducer characteristic mapping and the CNN characteristic mapping with corresponding resolution through a double-branch characteristic coupling module; 4. the decoder adaptively adjusts the deep features from the encoder and the features from the upper layer, and finally outputs a deformation field between the floating image and the fixed image; 5. and inputting the floating image and the deformation field into a space transformation network to obtain a registration image. 6. And calculating similarity loss between the registered image and the fixed image and regularization loss of a deformation field, and performing back propagation to train the network. The Swin Transformer and CNN double branches are used for feature extraction, the advantages of the two branches are fully utilized, and feature complementation is realized.

Description

Image registration method based on Swin transducer and CNN double-branch coupling
Technical Field
The invention belongs to the technical field of image registration, and particularly relates to an optimization method for effectively improving image registration performance.
Technical Field
Deformable Image Registration (DIR) is a basic task in image processing, has important clinical application value, and has recently received attention from many scholars. Many conventional registration methods minimize the cost function in an iterative manner. However, these methods involve a large number of operations, and registering a pair of images requires a large amount of time. In recent years, with the rapid development of Deep Learning (DL), image registration studies based on Deep Learning attract the attention of researchers due to the advantages of short time consumption and high precision. In general, methods based on deep learning can be classified into supervised methods and unsupervised methods. In image registration, the true deformation field is very difficult to acquire, and the true deformation field of the manual markers may introduce unnecessary errors. Therefore, supervised learning based methods generally obtain deformation field labels through conventional algorithms or simulated deformation. However, the registration accuracy of these methods is very dependent on the quality of the generated deformation field. Unsupervised learning based approaches have been increasing in this direction because the network can conduct guided training on the similarity between registered and fixed images without the need for a true deformation field. In recent years, a large number of unsupervised image registration methods based on Convolutional Neural Networks (CNN) have been proposed in succession, all with good results. However, CNNs cannot effectively capture far-range mappings in both moving and stationary images, subject to the constraints of the convolution kernel, and are thus limited in performance.
Recently, transform-based network architectures have been introduced into various computer vision tasks due to their powerful capabilities. Unlike convolution operations, the self-attention mechanism in the Transformer has an effective field of infinite size, which enables the Transformer to capture telespatial information. Although a general Transformer has strong long-range modeling capability and can effectively capture long-distance position corresponding relation, the number of voxels in an image registration task is too large, and a network is difficult to find a real corresponding voxel pair. Meanwhile, due to the characteristics of the convolution kernel, the capture capability of the CNN for the local detail information is far better than that of the Transformer. In addition, the Transformer divides the original image into a plurality of windows, and the windows lack interaction. In the image registration task, since the positions of the corresponding voxel pairs of the fixed image and the floating image are different, they are likely to exist in two different windows respectively, so that they are difficult to match with each other. In order to enhance the capture efficiency of the local relationship, the Swin Transformer local window is self-attentive, and the efficiency is greatly improved while the performance is improved. The Swin Transformer calculates self-attention under each window, and introduces a shift window operation for better information interaction with other windows. The shift window appears very bright in general visual tasks, which in practice is achieved indirectly by shifting the feature map. In the task of image registration, the significance of such operations may be small, and the positional relationship of corresponding points within different windows still cannot be effectively captured. The CNN convolution kernel slides on the characteristic diagram with the overlap, so that the condition that corresponding points in different windows in the transform cannot be captured can be effectively avoided.
Disclosure of Invention
The invention discloses an image registration method based on Swin transducer and CNN double-branch coupling. The method designs a novel double-branch coupling network structure, and the network structure is a U-shaped network formed by a classical encoder and a decoder. The encoder consists of a Swin transducer branch and a CNN branch, and can effectively utilize the self-attention feature based on the transducer and the convolution feature based on the CNN. And a feature coupling module is adopted to complementarily fuse the feature mapping of the Swin transducer with the feature mapping of the CNN in an interactive mode, so that the feature expression capability of two encoder branches is fully promoted, and the registration performance is further improved.
The technical solution for realizing the invention is as follows: an image registration method based on Swin transducer and CNN double-branch coupling comprises the following steps:
the first step is as follows: performing preprocessing of standards of gray value normalization, center clipping, resampling and affine transformation on all images in original data;
the second step is that: splicing the floating image and the fixed image, sending the spliced floating image and fixed image into a registration network, and parallelly passing through a Swin transform and a CNN two encoder branches;
the third step: in each stage of the Swin transducer, performing characteristic interaction and fusion on the Swin transducer characteristic mapping and the CNN characteristic mapping with corresponding resolution through a double-branch characteristic coupling module;
the fourth step: the decoder adaptively adjusts the deep features from the encoder and the features from the upper layer, and finally outputs a deformation field between the floating image and the fixed image;
the fifth step: inputting the floating image and the deformation field into a space transformation network to obtain a registration image;
and a sixth step: and calculating similarity loss between the registered image and the fixed image and regularization loss of a deformation field, and performing back propagation to train the network.
Compared with the prior art, the invention has the remarkable characteristics that: (1) a Swin transducer encoder and a CNN encoder are designed in parallel, and the attention feature based on Swin transducer and the convolution feature based on CNN are fused at the same time, so that the generalization capability of the model is enhanced. (2) And a bidirectional interaction mechanism is adopted to promote the feature extraction capability of the Swin transform and the CNN, and meanwhile, the feature mapping of the Swin transform and the CNN is complemented. (3) The network is an unsupervised end-to-end model, all modules are trained and inferred in a unified mode, extra labels are not needed for training (4), the method for registering the speed block is high in registering accuracy.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a diagram of a network architecture of the present invention.
FIG. 3 is a Swin transducer Block schematic.
Fig. 4 is a diagram of a dual-branch signature coupling module.
Fig. 5 is a schematic of the registration of fixed and floating images and different methods in the LBPA40 dataset.
Fig. 6 is a differential schematic of different methods for registering an image with a fixation image in the LBPA40 dataset.
Detailed Description
The invention designs a registration network based on double branch coupling of Swin transducer and CNN, the method adopts parallel design, mutually promotes self-attention characteristics based on Swin transducer and convolution characteristics based on CNN in a bidirectional interaction mode, enhances respective characteristic representation, and accordingly captures accurate spatial correspondence between input motion and fixed images. The network architecture of the present invention is shown in figure 2.
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, the steps of the present invention will be described in detail.
The method comprises the following steps of firstly, preprocessing all images in original data by performing standards such as gray value normalization, center cropping, resampling and affine transformation. The gray value normalization step shrinks the gray value of the image to a [0,1] interval, and the calculation formula is as follows:
Figure BDA0003687673930000021
wherein, I min And I max Representing the minimum and maximum gray value values in the image, respectively.
And secondly, splicing the floating image and the fixed image, sending the spliced images into a registration network, and parallelly passing through Swin transform and CNN two encoder branches. The floating image and the fixed image are set to M and F, respectively.
In the Swin Transformer branch, the input image is first divided into non-overlapping 3D tiles (Patch), each tile being 2 XP P; to be provided with
Figure BDA0003687673930000022
Representing the ith image block, where i ∈ { 1., N },
Figure BDA0003687673930000023
is the total number of image blocks; each tile is flattened and treated as a Token, and then each Token is projected into a dimension using a linear mapping layerThe characteristic of degree C indicates that:
Figure BDA0003687673930000024
wherein the content of the first and second substances,
Figure BDA0003687673930000031
representing a linear mapping, output z 0 The dimension of (a) is NxC;
after the linear mapping layer, the branch has 4 consecutive stages. The 1 st stage consists of a linear mapping layer and a plurality of Swin transform blocks; each of the other 3 stages consists of a Patch measuring layer and a plurality of Swin transform blocks; the Swin Transformer block outputs the same number of Token as the input, and the Patch Merging layer connects features of each group of 2 × 2 × 2 adjacent Token, thereby generating 8C-dimensional feature embedding; the feature size of the representation is then reduced to 2C using the linear layer; in this branch, the outputs of two consecutive Swin Transformer blocks are calculated as follows:
Figure BDA0003687673930000032
Figure BDA0003687673930000033
Figure BDA0003687673930000034
Figure BDA0003687673930000035
wherein W-MSA and SW-MSA are conventional and windowed multi-headed self-attention modules, respectively;
Figure BDA0003687673930000036
and z l Denotes W-MSA andan output of the SW-MSA; MLP and LN represent the multilayer perceptron and regularization layers, respectively; the shift window calculating mechanism adopts 3D cyclic shift to calculate self attention, and the calculation formula is as follows:
Figure BDA0003687673930000037
q, K, V respectively represents Query, Key and Value matrix, d represents the dimension of Query and Key;
the CNN branch adopts a characteristic pyramid structure, wherein the resolution of characteristic mapping is reduced along with the depth of a network, but the number of channels is increased layer by layer; 3D convolution is uniformly adopted, the convolution kernel size is 3 multiplied by 3, each convolution is followed by an LeakyReLU layer, and down-sampling operation is carried out through a maximum pooling layer.
The structure of the double branch is shown in figure 2.
Thirdly, performing feature interaction and fusion on Swin transform feature mapping and CNN feature mapping with corresponding resolution through a double-branch feature coupling module at each stage of the Swin transform; the CNN branch firstly uses a 3 multiplied by 3 convolution to extract the feature mapping of an upper layer after downsampling, then aligns the feature mapping with the Swin Transformer feature mapping through a 1 multiplied by 1 convolution self-adaption, and simultaneously uses a LayerNorm module to carry out regularization on the feature mapping and adds the regularization into the Swin Transformer feature mapping; then, the Swin Transformer branch sends the fused features into Swin Transformer Blocks to obtain a new feature representation; aligning the CNN feature map with the BatchNorm module by a 1 × 1 × 1 convolution and adding it to the CNN feature map; and finally, adaptively adjusting the aggregation characteristics by using a 3 × 3 × 3 convolution, and further improving the registration accuracy. The details of the dual-branch eigencoupling module are shown in fig. 4.
Fourthly, the decoder adaptively adjusts the deep layer characteristics from the encoder and the characteristics from the upper layer, and finally outputs the deformation field between the floating image and the fixed image
Figure BDA00036876739300000311
Encoder feature mappingConnected with the upper layer feature map from the decoding path by a skip connection, then passed through two consecutive 3 × 3 × 3 convolutional layers, and the resolution of the feature map is increased by 2 times using the upsampled layer; except for the last convolutional layer, each convolutional layer is followed by an LeakyReLU unit activation; finally, the deformation field between the input image pair is obtained by a 3 × 3 × 3 convolution
Figure BDA0003687673930000038
The specific process can be seen in fig. 2.
Fifthly, inputting the floating image and the deformation field into a space transformation network to obtain a registration image
Figure BDA0003687673930000039
Deformation field obtained for space transformation network
Figure BDA00036876739300000310
The floating image M is non-linearly warped. In the output image, for each voxel p, the values of eight neighboring voxels are linearly interpolated:
Figure BDA0003687673930000041
wherein
Figure BDA0003687673930000042
P' is a neighboring voxel set, q is a certain voxel in the neighboring voxel set, and d is a space in three directions of x, y and z.
And sixthly, calculating similarity loss between the registered image and the fixed image and regularization loss of a deformation field, and performing back propagation to train the network. The loss function L of the network consists of an image similarity term and a deformation field regular term, and the calculation formula is as follows:
Figure BDA0003687673930000043
wherein
Figure BDA0003687673930000044
Which represents the loss of similarity of the image,
Figure BDA0003687673930000045
representing the deformation field regularization loss and λ the regularization parameter. Local normalized cross-correlation (LNCC), which is commonly used in the field of image registration, is adopted as an image similarity loss, and the calculation formula is as follows:
Figure BDA0003687673930000046
where Ω denotes the spatial domain of the input image, p denotes voxels in the spatial domain,
Figure BDA0003687673930000047
and
Figure BDA0003687673930000048
representing a size n centered on the voxel p 3 Average voxel values within the local window of (a). The L2 norm of the deformation field gradient is used as the regularization loss, and the calculation formula is as follows:
Figure BDA0003687673930000049
wherein
Figure BDA00036876739300000410
Is the difference field between neighboring voxels in Ω, here as the gradient field.
The effect of the present invention can be further illustrated by the following simulation experiments:
simulation conditions
The present invention simulation uses two three-dimensional brain datasets, Mindbogle101 and LBPA 40.
The Mindboggle101 and LPBA40 contained 101T 1 weighted MR images and 40T 1 weighted MR images, respectively. The Mindboggle101 has a segmentation mask with 25 anatomical landmarks per image and the LPBA40 has a segmentation mask with 56 anatomical landmarks per image. For the Mindbogle101 dataset, 42 1722 pairs of images in the NKIRS-22 and NKI-TRT-20 subsets were selected for training, and 20 380 pairs of images in the OASIS-TRT-20 subset were selected for testing. On the LPBA40 dataset, the first 30 870 pairs of images were taken as the training set and the remaining 10 90 pairs of images were taken as the test set. The registration results were evaluated with a Dice coefficient and a hausdorff distance of 95% (HD 95). The larger the value of the Dice coefficient is, the larger the overlapped part of the two areas is proved to be, and the better the registration effect is. The smaller the HD95 value, the smaller the distance between the point sets in the two regions, the better the registration.
The experiment is carried out under an Ubuntu18.04 operating system, the used hardware facilities are NVIDIA GeForce RTX 2080Ti GPU of two video memories 11G, the software environment is python3.7, the model is realized based on a Pythroch frame, Adam is used as an optimizer, the batch size is set to be 1, the learning rate is 1e-4, the regularization parameter lambda is set to be 1 on a Mindbogle101 data set, and the regularization parameter lambda is set to be 5 on an LBPA40 data set.
Emulated content
In order to test the performance of the algorithm, the Proposed image registration method (deployed) based on Swin transform and CNN double-branch coupling is compared with other currently internationally advanced registration algorithms. The comparison method comprises the following steps: VoxelMorph (VM), Vit-V-Net (V-V-N), and TransMorph (TM), among others. Meanwhile, in order to prove the effectiveness of branch fusion of the Swin transducer and CNN dual-encoder in the method, VoxelMorph-Huge (VM-H, increasing the number of convolutional layer channels) and TransMorph-Large (TM-L, increasing the embedding dimension C, the number of Swin transducer Blocks and the number of Head) are also compared. The superparameters of all comparative experiments remained consistent.
Analysis of simulation experiment results
Table 1 shows the initial values of the two evaluation indices in the two data sets, the results of the various comparison methods and the results of the method of the invention, while also giving the inference time of the methods. It can be seen that the inventive method has the best registration accuracy on the test set of the Mindbogle101 and LBPA40 data sets compared to other methods. Compared with the VoxelMorph-Huge and Transmorph-Large, the method has higher registration precision under the condition of less inference time, and proves the effectiveness of double-branch complementary fusion of Swin transform and CNN in the method. The effect of the method of the invention and the comparative method are shown in figures 5-6. The simulation experiment results of the two groups of real data sets show the effectiveness of the method.
TABLE 1
Figure BDA0003687673930000051

Claims (7)

1. An image registration method based on Swin transducer and CNN double-branch coupling is characterized by comprising the following steps:
the first step is as follows: performing preprocessing of standards of gray value normalization, center clipping, resampling and affine transformation on all images in original data;
the second step is that: splicing the floating image and the fixed image, sending the spliced floating image and fixed image into a registration network, and parallelly passing through a Swin transform and a CNN two encoder branches;
the third step: in each stage of the Swin transducer, performing characteristic interaction and fusion on the Swin transducer characteristic mapping and the CNN characteristic mapping with corresponding resolution through a double-branch characteristic coupling module;
the fourth step: the decoder adaptively adjusts the deep features from the encoder and the features from the upper layer, and finally outputs a deformation field between the floating image and the fixed image;
the fifth step: inputting the floating image and the deformation field into a space transformation network to obtain a registration image;
and a sixth step: and calculating similarity loss between the registered image and the fixed image and regularization loss of a deformation field, and performing back propagation to train the network.
2. The Swin Transformer and CNN dual-branch coupling-based image registration method of claim 1, wherein the first step performs pre-processing on all images in the original data by performing criteria of gray value normalization, center clipping, resampling and affine transformation;
the gray value normalization step shrinks the gray value of the image to a [0,1] interval, and the calculation formula is as follows:
Figure FDA0003687673920000011
wherein, I min And I max Representing the minimum and maximum values of the gray-scale values in the image, respectively.
3. The Swin Transformer and CNN dual-branch coupling-based image registration method of claim 1, wherein: and secondly, splicing the floating image and the fixed image, sending the spliced images into a registration network, and parallelly passing through a Swin transform and a CNN encoder branch, wherein the implementation method comprises the following steps: randomly selecting a floating image and a fixed image from the processed data, splicing the floating image and the fixed image, sending the spliced images into a registration network, and parallelly passing through two encoder branches of Swin transform and CNN; wherein the floating image and the fixed image are set to M and F, respectively;
in the Swin Transformer branch, the input image is first divided into non-overlapping 3D tiles (Patch), each tile being 2 XP P; to be provided with
Figure FDA0003687673920000019
Representing the ith image block, where i ∈ { 1., N },
Figure FDA0003687673920000012
is the total number of image blocks; each image block is flattened and treated as a Token, and then each Token is projected to a feature representation of dimension C using a linear mapping layer:
Figure FDA0003687673920000013
wherein the content of the first and second substances,
Figure FDA0003687673920000014
representing a linear mapping, output z 0 The dimension of (a) is NxC;
after the linear mapping layer, the branch has 4 consecutive stages; the 1 st stage consists of a linear mapping layer and a plurality of Swin transform blocks; each of the other 3 stages consists of a Patch measuring layer and a plurality of Swin transform blocks; the Swin Transformer block outputs the same number of Token as the input, and the Patch Merging layer connects features of each group of 2 × 2 × 2 adjacent Token, thereby generating 8C-dimensional feature embedding; the feature size of the representation is then reduced to 2C using the linear layer; in this branch, the output of two consecutive Swin Transformer blocks is calculated as follows:
Figure FDA0003687673920000015
Figure FDA0003687673920000016
Figure FDA0003687673920000017
Figure FDA0003687673920000018
wherein W-MSA and SW-MSA are conventional and windowed multi-headed self-attention modules, respectively;
Figure FDA0003687673920000021
and z l Represents the outputs of W-MSA and SW-MSA; MLP and LN represent multilayer perceptrons and regularization layers, respectively; the shift window calculating mechanism adopts 3D cyclic shift to calculate self attention, and the calculation formula is as follows:
Figure FDA0003687673920000022
q, K, V respectively represents Query, Key and Value matrix, d represents dimensions of Query and Key;
the CNN branch adopts a characteristic pyramid structure, wherein the resolution of characteristic mapping is reduced along with the depth of a network, but the number of channels is increased layer by layer; 3D convolution is uniformly adopted, the convolution kernel size is 3 multiplied by 3, each convolution is followed by a LeakyReLU layer, and downsampling operation is carried out through a maximum pooling layer.
4. The Swin Transformer and CNN dual-branch coupling-based image registration method of claim 1, wherein: thirdly, performing feature interaction and fusion on Swin transform feature mapping and CNN feature mapping with corresponding resolution through a double-branch feature coupling module at each stage of the Swin transform; the CNN branch firstly uses a 3 multiplied by 3 convolution to extract the feature mapping of an upper layer after downsampling, then aligns the feature mapping with the Swin Transformer feature mapping through a 1 multiplied by 1 convolution self-adaption, and simultaneously uses a LayerNorm module to carry out regularization on the feature mapping and adds the regularization into the Swin Transformer feature mapping; then, the Swin Transformer branch sends the fused features into Swin Transformer Blocks to obtain a new feature representation; aligning the CNN feature map with the BatchNorm module by a 1 × 1 × 1 convolution and adding it to the CNN feature map; finally, the aggregated features are adaptively adjusted using a 33 convolution.
5. The Swin Transformer and CNN dual-branch coupling-based image registration method of claim 1, wherein: the fourth step decoder self-adaptively adjusts the deep layer characteristics from the encoder and the characteristics from the upper layer, and finally outputs the deformation field between the floating image and the fixed image
Figure FDA0003687673920000023
The encoder feature map is connected to the upper layer feature map from the decoding path by a skip connection, then passes through two consecutive 3 x 3 convolutional layers, and uses the upsampling layer to increase the resolution of the feature map by 2 times; except for the last convolutional layer, each convolutional layer is followed by an LeakyReLU unit activation; finally, the deformation field between the input image pair is obtained by a 3 × 3 × 3 convolution
Figure FDA0003687673920000024
6. The Swin Transformer and CNN dual-branch coupling-based image registration method of claim 1, wherein: fifthly, inputting the floating image and the deformation field into a space transformation network to obtain a registration image
Figure FDA0003687673920000025
Predicted deformation field for spatial transform networks
Figure FDA0003687673920000026
Performing nonlinear distortion on the floating image M; in the output image, for each voxel p, the values of eight neighboring voxels are linearly interpolated:
Figure FDA0003687673920000027
wherein
Figure FDA0003687673920000028
P' is a neighboring voxel set, q is a certain voxel in the neighboring voxel set, and d is a space in three directions of x, y and z.
7. The Swin Transformer and CNN dual-branch coupling-based image registration method of claim 1, wherein: sixthly, calculating similarity loss between the registered image and the fixed image and regularization loss of a deformation field, and performing back propagation to train a network; the loss function L of the network consists of an image similarity term and a deformation field regular term, and the calculation formula is as follows:
Figure FDA0003687673920000029
wherein
Figure FDA00036876739200000210
Which represents the loss of similarity of the image,
Figure FDA00036876739200000211
representing deformation field regularization loss, and lambda represents a regularization parameter; local normalized cross-correlation LNCC in the field of image registration is adopted as image similarity loss, and the calculation formula is as follows:
Figure FDA0003687673920000031
where Ω denotes the spatial domain of the input image, p denotes voxels in the spatial domain,
Figure FDA0003687673920000032
and
Figure FDA0003687673920000033
representing a size n centered on the voxel p 3 Average voxel value within the local window of (a); the L2 norm of the deformation field gradient is used as the regularization loss, and the calculation formula is as follows:
Figure FDA0003687673920000034
wherein
Figure FDA0003687673920000035
Is in omega middle phaseThe difference field between neighboring voxels is here taken as the gradient field.
CN202210650873.1A 2022-06-10 2022-06-10 Image registration method based on Swin transducer and CNN double-branch coupling Pending CN115082293A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210650873.1A CN115082293A (en) 2022-06-10 2022-06-10 Image registration method based on Swin transducer and CNN double-branch coupling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210650873.1A CN115082293A (en) 2022-06-10 2022-06-10 Image registration method based on Swin transducer and CNN double-branch coupling

Publications (1)

Publication Number Publication Date
CN115082293A true CN115082293A (en) 2022-09-20

Family

ID=83251729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210650873.1A Pending CN115082293A (en) 2022-06-10 2022-06-10 Image registration method based on Swin transducer and CNN double-branch coupling

Country Status (1)

Country Link
CN (1) CN115082293A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115795683A (en) * 2022-12-08 2023-03-14 四川大学 Wing profile optimization method fusing CNN and Swin transform network
CN116012344A (en) * 2023-01-29 2023-04-25 东北林业大学 Cardiac magnetic resonance image registration method based on mask self-encoder CNN-transducer
CN116051519A (en) * 2023-02-02 2023-05-02 广东国地规划科技股份有限公司 Method, device, equipment and storage medium for detecting double-time-phase image building change
CN116071226A (en) * 2023-03-06 2023-05-05 中国科学技术大学 Electronic microscope image registration system and method based on attention network
CN116188816A (en) * 2022-12-29 2023-05-30 广东省新黄埔中医药联合创新研究院 Acupoint positioning method based on cyclic consistency deformation image matching network
CN116958556A (en) * 2023-08-01 2023-10-27 东莞理工学院 Dual-channel complementary spine image segmentation method for vertebral body and intervertebral disc segmentation

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115795683A (en) * 2022-12-08 2023-03-14 四川大学 Wing profile optimization method fusing CNN and Swin transform network
CN115795683B (en) * 2022-12-08 2023-07-21 四川大学 Airfoil optimization method integrating CNN and Swin converter network
CN116188816A (en) * 2022-12-29 2023-05-30 广东省新黄埔中医药联合创新研究院 Acupoint positioning method based on cyclic consistency deformation image matching network
CN116012344A (en) * 2023-01-29 2023-04-25 东北林业大学 Cardiac magnetic resonance image registration method based on mask self-encoder CNN-transducer
CN116012344B (en) * 2023-01-29 2023-10-20 东北林业大学 Cardiac magnetic resonance image registration method based on mask self-encoder CNN-transducer
CN116051519A (en) * 2023-02-02 2023-05-02 广东国地规划科技股份有限公司 Method, device, equipment and storage medium for detecting double-time-phase image building change
CN116051519B (en) * 2023-02-02 2023-08-22 广东国地规划科技股份有限公司 Method, device, equipment and storage medium for detecting double-time-phase image building change
CN116071226A (en) * 2023-03-06 2023-05-05 中国科学技术大学 Electronic microscope image registration system and method based on attention network
CN116958556A (en) * 2023-08-01 2023-10-27 东莞理工学院 Dual-channel complementary spine image segmentation method for vertebral body and intervertebral disc segmentation
CN116958556B (en) * 2023-08-01 2024-03-19 东莞理工学院 Dual-channel complementary spine image segmentation method for vertebral body and intervertebral disc segmentation

Similar Documents

Publication Publication Date Title
CN110738697B (en) Monocular depth estimation method based on deep learning
CN115082293A (en) Image registration method based on Swin transducer and CNN double-branch coupling
CN111339903B (en) Multi-person human body posture estimation method
CN112651973B (en) Semantic segmentation method based on cascade of feature pyramid attention and mixed attention
CN111612807B (en) Small target image segmentation method based on scale and edge information
WO2023185243A1 (en) Expression recognition method based on attention-modulated contextual spatial information
CN112288011B (en) Image matching method based on self-attention deep neural network
CN111680695A (en) Semantic segmentation method based on reverse attention model
CN112396607A (en) Streetscape image semantic segmentation method for deformable convolution fusion enhancement
CN113177555B (en) Target processing method and device based on cross-level, cross-scale and cross-attention mechanism
JP7337268B2 (en) Three-dimensional edge detection method, device, computer program and computer equipment
CN110738663A (en) Double-domain adaptive module pyramid network and unsupervised domain adaptive image segmentation method
CN113159232A (en) Three-dimensional target classification and segmentation method
CN115731441A (en) Target detection and attitude estimation method based on data cross-modal transfer learning
CN112001225A (en) Online multi-target tracking method, system and application
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN116563682A (en) Attention scheme and strip convolution semantic line detection method based on depth Hough network
CN110633706B (en) Semantic segmentation method based on pyramid network
CN115457509A (en) Traffic sign image segmentation algorithm based on improved space-time image convolution
Xu et al. Haar wavelet downsampling: A simple but effective downsampling module for semantic segmentation
Gao et al. Robust lane line segmentation based on group feature enhancement
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
Li et al. A new algorithm of vehicle license plate location based on convolutional neural network
CN115457263A (en) Lightweight portrait segmentation method based on deep learning
CN113344110B (en) Fuzzy image classification method based on super-resolution reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination