CN115082293A - Image registration method based on Swin transducer and CNN double-branch coupling - Google Patents
Image registration method based on Swin transducer and CNN double-branch coupling Download PDFInfo
- Publication number
- CN115082293A CN115082293A CN202210650873.1A CN202210650873A CN115082293A CN 115082293 A CN115082293 A CN 115082293A CN 202210650873 A CN202210650873 A CN 202210650873A CN 115082293 A CN115082293 A CN 115082293A
- Authority
- CN
- China
- Prior art keywords
- image
- swin
- cnn
- branch
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000008878 coupling Effects 0.000 title claims abstract description 22
- 238000010168 coupling process Methods 0.000 title claims abstract description 22
- 238000005859 coupling reaction Methods 0.000 title claims abstract description 22
- 238000013507 mapping Methods 0.000 claims abstract description 34
- 230000009466 transformation Effects 0.000 claims abstract description 10
- 230000003993 interaction Effects 0.000 claims abstract description 9
- 230000004927 fusion Effects 0.000 claims abstract description 7
- 238000010606 normalization Methods 0.000 claims abstract description 7
- 238000012952 Resampling Methods 0.000 claims abstract description 5
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 238000004364 calculation method Methods 0.000 claims description 10
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 125000004122 cyclic group Chemical group 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000011176 pooling Methods 0.000 claims description 2
- 239000000126 substance Substances 0.000 claims description 2
- 238000000605 extraction Methods 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 28
- 238000004088 simulation Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000012733 comparative method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G06T3/147—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G06T3/18—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4007—Interpolation-based scaling, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4038—Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4046—Scaling the whole image or part thereof using neural networks
Abstract
The invention discloses an image registration method based on double branch coupling of Swin transducer and CNN. The method comprises the following steps: 1. performing standard preprocessing steps such as gray value normalization, center clipping, resampling and the like on all images in the original data; 2. splicing the floating image and the fixed image, sending the spliced floating image and fixed image into a registration network, and parallelly passing through a Swin transform and a CNN two encoder branches; 3. in each stage of the Swin transducer, performing characteristic interaction and fusion on the Swin transducer characteristic mapping and the CNN characteristic mapping with corresponding resolution through a double-branch characteristic coupling module; 4. the decoder adaptively adjusts the deep features from the encoder and the features from the upper layer, and finally outputs a deformation field between the floating image and the fixed image; 5. and inputting the floating image and the deformation field into a space transformation network to obtain a registration image. 6. And calculating similarity loss between the registered image and the fixed image and regularization loss of a deformation field, and performing back propagation to train the network. The Swin Transformer and CNN double branches are used for feature extraction, the advantages of the two branches are fully utilized, and feature complementation is realized.
Description
Technical Field
The invention belongs to the technical field of image registration, and particularly relates to an optimization method for effectively improving image registration performance.
Technical Field
Deformable Image Registration (DIR) is a basic task in image processing, has important clinical application value, and has recently received attention from many scholars. Many conventional registration methods minimize the cost function in an iterative manner. However, these methods involve a large number of operations, and registering a pair of images requires a large amount of time. In recent years, with the rapid development of Deep Learning (DL), image registration studies based on Deep Learning attract the attention of researchers due to the advantages of short time consumption and high precision. In general, methods based on deep learning can be classified into supervised methods and unsupervised methods. In image registration, the true deformation field is very difficult to acquire, and the true deformation field of the manual markers may introduce unnecessary errors. Therefore, supervised learning based methods generally obtain deformation field labels through conventional algorithms or simulated deformation. However, the registration accuracy of these methods is very dependent on the quality of the generated deformation field. Unsupervised learning based approaches have been increasing in this direction because the network can conduct guided training on the similarity between registered and fixed images without the need for a true deformation field. In recent years, a large number of unsupervised image registration methods based on Convolutional Neural Networks (CNN) have been proposed in succession, all with good results. However, CNNs cannot effectively capture far-range mappings in both moving and stationary images, subject to the constraints of the convolution kernel, and are thus limited in performance.
Recently, transform-based network architectures have been introduced into various computer vision tasks due to their powerful capabilities. Unlike convolution operations, the self-attention mechanism in the Transformer has an effective field of infinite size, which enables the Transformer to capture telespatial information. Although a general Transformer has strong long-range modeling capability and can effectively capture long-distance position corresponding relation, the number of voxels in an image registration task is too large, and a network is difficult to find a real corresponding voxel pair. Meanwhile, due to the characteristics of the convolution kernel, the capture capability of the CNN for the local detail information is far better than that of the Transformer. In addition, the Transformer divides the original image into a plurality of windows, and the windows lack interaction. In the image registration task, since the positions of the corresponding voxel pairs of the fixed image and the floating image are different, they are likely to exist in two different windows respectively, so that they are difficult to match with each other. In order to enhance the capture efficiency of the local relationship, the Swin Transformer local window is self-attentive, and the efficiency is greatly improved while the performance is improved. The Swin Transformer calculates self-attention under each window, and introduces a shift window operation for better information interaction with other windows. The shift window appears very bright in general visual tasks, which in practice is achieved indirectly by shifting the feature map. In the task of image registration, the significance of such operations may be small, and the positional relationship of corresponding points within different windows still cannot be effectively captured. The CNN convolution kernel slides on the characteristic diagram with the overlap, so that the condition that corresponding points in different windows in the transform cannot be captured can be effectively avoided.
Disclosure of Invention
The invention discloses an image registration method based on Swin transducer and CNN double-branch coupling. The method designs a novel double-branch coupling network structure, and the network structure is a U-shaped network formed by a classical encoder and a decoder. The encoder consists of a Swin transducer branch and a CNN branch, and can effectively utilize the self-attention feature based on the transducer and the convolution feature based on the CNN. And a feature coupling module is adopted to complementarily fuse the feature mapping of the Swin transducer with the feature mapping of the CNN in an interactive mode, so that the feature expression capability of two encoder branches is fully promoted, and the registration performance is further improved.
The technical solution for realizing the invention is as follows: an image registration method based on Swin transducer and CNN double-branch coupling comprises the following steps:
the first step is as follows: performing preprocessing of standards of gray value normalization, center clipping, resampling and affine transformation on all images in original data;
the second step is that: splicing the floating image and the fixed image, sending the spliced floating image and fixed image into a registration network, and parallelly passing through a Swin transform and a CNN two encoder branches;
the third step: in each stage of the Swin transducer, performing characteristic interaction and fusion on the Swin transducer characteristic mapping and the CNN characteristic mapping with corresponding resolution through a double-branch characteristic coupling module;
the fourth step: the decoder adaptively adjusts the deep features from the encoder and the features from the upper layer, and finally outputs a deformation field between the floating image and the fixed image;
the fifth step: inputting the floating image and the deformation field into a space transformation network to obtain a registration image;
and a sixth step: and calculating similarity loss between the registered image and the fixed image and regularization loss of a deformation field, and performing back propagation to train the network.
Compared with the prior art, the invention has the remarkable characteristics that: (1) a Swin transducer encoder and a CNN encoder are designed in parallel, and the attention feature based on Swin transducer and the convolution feature based on CNN are fused at the same time, so that the generalization capability of the model is enhanced. (2) And a bidirectional interaction mechanism is adopted to promote the feature extraction capability of the Swin transform and the CNN, and meanwhile, the feature mapping of the Swin transform and the CNN is complemented. (3) The network is an unsupervised end-to-end model, all modules are trained and inferred in a unified mode, extra labels are not needed for training (4), the method for registering the speed block is high in registering accuracy.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a diagram of a network architecture of the present invention.
FIG. 3 is a Swin transducer Block schematic.
Fig. 4 is a diagram of a dual-branch signature coupling module.
Fig. 5 is a schematic of the registration of fixed and floating images and different methods in the LBPA40 dataset.
Fig. 6 is a differential schematic of different methods for registering an image with a fixation image in the LBPA40 dataset.
Detailed Description
The invention designs a registration network based on double branch coupling of Swin transducer and CNN, the method adopts parallel design, mutually promotes self-attention characteristics based on Swin transducer and convolution characteristics based on CNN in a bidirectional interaction mode, enhances respective characteristic representation, and accordingly captures accurate spatial correspondence between input motion and fixed images. The network architecture of the present invention is shown in figure 2.
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, the steps of the present invention will be described in detail.
The method comprises the following steps of firstly, preprocessing all images in original data by performing standards such as gray value normalization, center cropping, resampling and affine transformation. The gray value normalization step shrinks the gray value of the image to a [0,1] interval, and the calculation formula is as follows:
wherein, I min And I max Representing the minimum and maximum gray value values in the image, respectively.
And secondly, splicing the floating image and the fixed image, sending the spliced images into a registration network, and parallelly passing through Swin transform and CNN two encoder branches. The floating image and the fixed image are set to M and F, respectively.
In the Swin Transformer branch, the input image is first divided into non-overlapping 3D tiles (Patch), each tile being 2 XP P; to be provided withRepresenting the ith image block, where i ∈ { 1., N },is the total number of image blocks; each tile is flattened and treated as a Token, and then each Token is projected into a dimension using a linear mapping layerThe characteristic of degree C indicates that:
wherein the content of the first and second substances,representing a linear mapping, output z 0 The dimension of (a) is NxC;
after the linear mapping layer, the branch has 4 consecutive stages. The 1 st stage consists of a linear mapping layer and a plurality of Swin transform blocks; each of the other 3 stages consists of a Patch measuring layer and a plurality of Swin transform blocks; the Swin Transformer block outputs the same number of Token as the input, and the Patch Merging layer connects features of each group of 2 × 2 × 2 adjacent Token, thereby generating 8C-dimensional feature embedding; the feature size of the representation is then reduced to 2C using the linear layer; in this branch, the outputs of two consecutive Swin Transformer blocks are calculated as follows:
wherein W-MSA and SW-MSA are conventional and windowed multi-headed self-attention modules, respectively;and z l Denotes W-MSA andan output of the SW-MSA; MLP and LN represent the multilayer perceptron and regularization layers, respectively; the shift window calculating mechanism adopts 3D cyclic shift to calculate self attention, and the calculation formula is as follows:
q, K, V respectively represents Query, Key and Value matrix, d represents the dimension of Query and Key;
the CNN branch adopts a characteristic pyramid structure, wherein the resolution of characteristic mapping is reduced along with the depth of a network, but the number of channels is increased layer by layer; 3D convolution is uniformly adopted, the convolution kernel size is 3 multiplied by 3, each convolution is followed by an LeakyReLU layer, and down-sampling operation is carried out through a maximum pooling layer.
The structure of the double branch is shown in figure 2.
Thirdly, performing feature interaction and fusion on Swin transform feature mapping and CNN feature mapping with corresponding resolution through a double-branch feature coupling module at each stage of the Swin transform; the CNN branch firstly uses a 3 multiplied by 3 convolution to extract the feature mapping of an upper layer after downsampling, then aligns the feature mapping with the Swin Transformer feature mapping through a 1 multiplied by 1 convolution self-adaption, and simultaneously uses a LayerNorm module to carry out regularization on the feature mapping and adds the regularization into the Swin Transformer feature mapping; then, the Swin Transformer branch sends the fused features into Swin Transformer Blocks to obtain a new feature representation; aligning the CNN feature map with the BatchNorm module by a 1 × 1 × 1 convolution and adding it to the CNN feature map; and finally, adaptively adjusting the aggregation characteristics by using a 3 × 3 × 3 convolution, and further improving the registration accuracy. The details of the dual-branch eigencoupling module are shown in fig. 4.
Fourthly, the decoder adaptively adjusts the deep layer characteristics from the encoder and the characteristics from the upper layer, and finally outputs the deformation field between the floating image and the fixed imageEncoder feature mappingConnected with the upper layer feature map from the decoding path by a skip connection, then passed through two consecutive 3 × 3 × 3 convolutional layers, and the resolution of the feature map is increased by 2 times using the upsampled layer; except for the last convolutional layer, each convolutional layer is followed by an LeakyReLU unit activation; finally, the deformation field between the input image pair is obtained by a 3 × 3 × 3 convolutionThe specific process can be seen in fig. 2.
Fifthly, inputting the floating image and the deformation field into a space transformation network to obtain a registration imageDeformation field obtained for space transformation networkThe floating image M is non-linearly warped. In the output image, for each voxel p, the values of eight neighboring voxels are linearly interpolated:
whereinP' is a neighboring voxel set, q is a certain voxel in the neighboring voxel set, and d is a space in three directions of x, y and z.
And sixthly, calculating similarity loss between the registered image and the fixed image and regularization loss of a deformation field, and performing back propagation to train the network. The loss function L of the network consists of an image similarity term and a deformation field regular term, and the calculation formula is as follows:
whereinWhich represents the loss of similarity of the image,representing the deformation field regularization loss and λ the regularization parameter. Local normalized cross-correlation (LNCC), which is commonly used in the field of image registration, is adopted as an image similarity loss, and the calculation formula is as follows:
where Ω denotes the spatial domain of the input image, p denotes voxels in the spatial domain,andrepresenting a size n centered on the voxel p 3 Average voxel values within the local window of (a). The L2 norm of the deformation field gradient is used as the regularization loss, and the calculation formula is as follows:
The effect of the present invention can be further illustrated by the following simulation experiments:
simulation conditions
The present invention simulation uses two three-dimensional brain datasets, Mindbogle101 and LBPA 40.
The Mindboggle101 and LPBA40 contained 101T 1 weighted MR images and 40T 1 weighted MR images, respectively. The Mindboggle101 has a segmentation mask with 25 anatomical landmarks per image and the LPBA40 has a segmentation mask with 56 anatomical landmarks per image. For the Mindbogle101 dataset, 42 1722 pairs of images in the NKIRS-22 and NKI-TRT-20 subsets were selected for training, and 20 380 pairs of images in the OASIS-TRT-20 subset were selected for testing. On the LPBA40 dataset, the first 30 870 pairs of images were taken as the training set and the remaining 10 90 pairs of images were taken as the test set. The registration results were evaluated with a Dice coefficient and a hausdorff distance of 95% (HD 95). The larger the value of the Dice coefficient is, the larger the overlapped part of the two areas is proved to be, and the better the registration effect is. The smaller the HD95 value, the smaller the distance between the point sets in the two regions, the better the registration.
The experiment is carried out under an Ubuntu18.04 operating system, the used hardware facilities are NVIDIA GeForce RTX 2080Ti GPU of two video memories 11G, the software environment is python3.7, the model is realized based on a Pythroch frame, Adam is used as an optimizer, the batch size is set to be 1, the learning rate is 1e-4, the regularization parameter lambda is set to be 1 on a Mindbogle101 data set, and the regularization parameter lambda is set to be 5 on an LBPA40 data set.
Emulated content
In order to test the performance of the algorithm, the Proposed image registration method (deployed) based on Swin transform and CNN double-branch coupling is compared with other currently internationally advanced registration algorithms. The comparison method comprises the following steps: VoxelMorph (VM), Vit-V-Net (V-V-N), and TransMorph (TM), among others. Meanwhile, in order to prove the effectiveness of branch fusion of the Swin transducer and CNN dual-encoder in the method, VoxelMorph-Huge (VM-H, increasing the number of convolutional layer channels) and TransMorph-Large (TM-L, increasing the embedding dimension C, the number of Swin transducer Blocks and the number of Head) are also compared. The superparameters of all comparative experiments remained consistent.
Analysis of simulation experiment results
Table 1 shows the initial values of the two evaluation indices in the two data sets, the results of the various comparison methods and the results of the method of the invention, while also giving the inference time of the methods. It can be seen that the inventive method has the best registration accuracy on the test set of the Mindbogle101 and LBPA40 data sets compared to other methods. Compared with the VoxelMorph-Huge and Transmorph-Large, the method has higher registration precision under the condition of less inference time, and proves the effectiveness of double-branch complementary fusion of Swin transform and CNN in the method. The effect of the method of the invention and the comparative method are shown in figures 5-6. The simulation experiment results of the two groups of real data sets show the effectiveness of the method.
TABLE 1
Claims (7)
1. An image registration method based on Swin transducer and CNN double-branch coupling is characterized by comprising the following steps:
the first step is as follows: performing preprocessing of standards of gray value normalization, center clipping, resampling and affine transformation on all images in original data;
the second step is that: splicing the floating image and the fixed image, sending the spliced floating image and fixed image into a registration network, and parallelly passing through a Swin transform and a CNN two encoder branches;
the third step: in each stage of the Swin transducer, performing characteristic interaction and fusion on the Swin transducer characteristic mapping and the CNN characteristic mapping with corresponding resolution through a double-branch characteristic coupling module;
the fourth step: the decoder adaptively adjusts the deep features from the encoder and the features from the upper layer, and finally outputs a deformation field between the floating image and the fixed image;
the fifth step: inputting the floating image and the deformation field into a space transformation network to obtain a registration image;
and a sixth step: and calculating similarity loss between the registered image and the fixed image and regularization loss of a deformation field, and performing back propagation to train the network.
2. The Swin Transformer and CNN dual-branch coupling-based image registration method of claim 1, wherein the first step performs pre-processing on all images in the original data by performing criteria of gray value normalization, center clipping, resampling and affine transformation;
the gray value normalization step shrinks the gray value of the image to a [0,1] interval, and the calculation formula is as follows:
wherein, I min And I max Representing the minimum and maximum values of the gray-scale values in the image, respectively.
3. The Swin Transformer and CNN dual-branch coupling-based image registration method of claim 1, wherein: and secondly, splicing the floating image and the fixed image, sending the spliced images into a registration network, and parallelly passing through a Swin transform and a CNN encoder branch, wherein the implementation method comprises the following steps: randomly selecting a floating image and a fixed image from the processed data, splicing the floating image and the fixed image, sending the spliced images into a registration network, and parallelly passing through two encoder branches of Swin transform and CNN; wherein the floating image and the fixed image are set to M and F, respectively;
in the Swin Transformer branch, the input image is first divided into non-overlapping 3D tiles (Patch), each tile being 2 XP P; to be provided withRepresenting the ith image block, where i ∈ { 1., N },is the total number of image blocks; each image block is flattened and treated as a Token, and then each Token is projected to a feature representation of dimension C using a linear mapping layer:
wherein the content of the first and second substances,representing a linear mapping, output z 0 The dimension of (a) is NxC;
after the linear mapping layer, the branch has 4 consecutive stages; the 1 st stage consists of a linear mapping layer and a plurality of Swin transform blocks; each of the other 3 stages consists of a Patch measuring layer and a plurality of Swin transform blocks; the Swin Transformer block outputs the same number of Token as the input, and the Patch Merging layer connects features of each group of 2 × 2 × 2 adjacent Token, thereby generating 8C-dimensional feature embedding; the feature size of the representation is then reduced to 2C using the linear layer; in this branch, the output of two consecutive Swin Transformer blocks is calculated as follows:
wherein W-MSA and SW-MSA are conventional and windowed multi-headed self-attention modules, respectively;and z l Represents the outputs of W-MSA and SW-MSA; MLP and LN represent multilayer perceptrons and regularization layers, respectively; the shift window calculating mechanism adopts 3D cyclic shift to calculate self attention, and the calculation formula is as follows:
q, K, V respectively represents Query, Key and Value matrix, d represents dimensions of Query and Key;
the CNN branch adopts a characteristic pyramid structure, wherein the resolution of characteristic mapping is reduced along with the depth of a network, but the number of channels is increased layer by layer; 3D convolution is uniformly adopted, the convolution kernel size is 3 multiplied by 3, each convolution is followed by a LeakyReLU layer, and downsampling operation is carried out through a maximum pooling layer.
4. The Swin Transformer and CNN dual-branch coupling-based image registration method of claim 1, wherein: thirdly, performing feature interaction and fusion on Swin transform feature mapping and CNN feature mapping with corresponding resolution through a double-branch feature coupling module at each stage of the Swin transform; the CNN branch firstly uses a 3 multiplied by 3 convolution to extract the feature mapping of an upper layer after downsampling, then aligns the feature mapping with the Swin Transformer feature mapping through a 1 multiplied by 1 convolution self-adaption, and simultaneously uses a LayerNorm module to carry out regularization on the feature mapping and adds the regularization into the Swin Transformer feature mapping; then, the Swin Transformer branch sends the fused features into Swin Transformer Blocks to obtain a new feature representation; aligning the CNN feature map with the BatchNorm module by a 1 × 1 × 1 convolution and adding it to the CNN feature map; finally, the aggregated features are adaptively adjusted using a 33 convolution.
5. The Swin Transformer and CNN dual-branch coupling-based image registration method of claim 1, wherein: the fourth step decoder self-adaptively adjusts the deep layer characteristics from the encoder and the characteristics from the upper layer, and finally outputs the deformation field between the floating image and the fixed imageThe encoder feature map is connected to the upper layer feature map from the decoding path by a skip connection, then passes through two consecutive 3 x 3 convolutional layers, and uses the upsampling layer to increase the resolution of the feature map by 2 times; except for the last convolutional layer, each convolutional layer is followed by an LeakyReLU unit activation; finally, the deformation field between the input image pair is obtained by a 3 × 3 × 3 convolution
6. The Swin Transformer and CNN dual-branch coupling-based image registration method of claim 1, wherein: fifthly, inputting the floating image and the deformation field into a space transformation network to obtain a registration imagePredicted deformation field for spatial transform networksPerforming nonlinear distortion on the floating image M; in the output image, for each voxel p, the values of eight neighboring voxels are linearly interpolated:
7. The Swin Transformer and CNN dual-branch coupling-based image registration method of claim 1, wherein: sixthly, calculating similarity loss between the registered image and the fixed image and regularization loss of a deformation field, and performing back propagation to train a network; the loss function L of the network consists of an image similarity term and a deformation field regular term, and the calculation formula is as follows:
whereinWhich represents the loss of similarity of the image,representing deformation field regularization loss, and lambda represents a regularization parameter; local normalized cross-correlation LNCC in the field of image registration is adopted as image similarity loss, and the calculation formula is as follows:
where Ω denotes the spatial domain of the input image, p denotes voxels in the spatial domain,andrepresenting a size n centered on the voxel p 3 Average voxel value within the local window of (a); the L2 norm of the deformation field gradient is used as the regularization loss, and the calculation formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210650873.1A CN115082293A (en) | 2022-06-10 | 2022-06-10 | Image registration method based on Swin transducer and CNN double-branch coupling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210650873.1A CN115082293A (en) | 2022-06-10 | 2022-06-10 | Image registration method based on Swin transducer and CNN double-branch coupling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115082293A true CN115082293A (en) | 2022-09-20 |
Family
ID=83251729
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210650873.1A Pending CN115082293A (en) | 2022-06-10 | 2022-06-10 | Image registration method based on Swin transducer and CNN double-branch coupling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115082293A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115795683A (en) * | 2022-12-08 | 2023-03-14 | 四川大学 | Wing profile optimization method fusing CNN and Swin transform network |
CN116012344A (en) * | 2023-01-29 | 2023-04-25 | 东北林业大学 | Cardiac magnetic resonance image registration method based on mask self-encoder CNN-transducer |
CN116051519A (en) * | 2023-02-02 | 2023-05-02 | 广东国地规划科技股份有限公司 | Method, device, equipment and storage medium for detecting double-time-phase image building change |
CN116071226A (en) * | 2023-03-06 | 2023-05-05 | 中国科学技术大学 | Electronic microscope image registration system and method based on attention network |
CN116188816A (en) * | 2022-12-29 | 2023-05-30 | 广东省新黄埔中医药联合创新研究院 | Acupoint positioning method based on cyclic consistency deformation image matching network |
CN116958556A (en) * | 2023-08-01 | 2023-10-27 | 东莞理工学院 | Dual-channel complementary spine image segmentation method for vertebral body and intervertebral disc segmentation |
-
2022
- 2022-06-10 CN CN202210650873.1A patent/CN115082293A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115795683A (en) * | 2022-12-08 | 2023-03-14 | 四川大学 | Wing profile optimization method fusing CNN and Swin transform network |
CN115795683B (en) * | 2022-12-08 | 2023-07-21 | 四川大学 | Airfoil optimization method integrating CNN and Swin converter network |
CN116188816A (en) * | 2022-12-29 | 2023-05-30 | 广东省新黄埔中医药联合创新研究院 | Acupoint positioning method based on cyclic consistency deformation image matching network |
CN116012344A (en) * | 2023-01-29 | 2023-04-25 | 东北林业大学 | Cardiac magnetic resonance image registration method based on mask self-encoder CNN-transducer |
CN116012344B (en) * | 2023-01-29 | 2023-10-20 | 东北林业大学 | Cardiac magnetic resonance image registration method based on mask self-encoder CNN-transducer |
CN116051519A (en) * | 2023-02-02 | 2023-05-02 | 广东国地规划科技股份有限公司 | Method, device, equipment and storage medium for detecting double-time-phase image building change |
CN116051519B (en) * | 2023-02-02 | 2023-08-22 | 广东国地规划科技股份有限公司 | Method, device, equipment and storage medium for detecting double-time-phase image building change |
CN116071226A (en) * | 2023-03-06 | 2023-05-05 | 中国科学技术大学 | Electronic microscope image registration system and method based on attention network |
CN116958556A (en) * | 2023-08-01 | 2023-10-27 | 东莞理工学院 | Dual-channel complementary spine image segmentation method for vertebral body and intervertebral disc segmentation |
CN116958556B (en) * | 2023-08-01 | 2024-03-19 | 东莞理工学院 | Dual-channel complementary spine image segmentation method for vertebral body and intervertebral disc segmentation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110738697B (en) | Monocular depth estimation method based on deep learning | |
CN115082293A (en) | Image registration method based on Swin transducer and CNN double-branch coupling | |
CN111339903B (en) | Multi-person human body posture estimation method | |
CN112651973B (en) | Semantic segmentation method based on cascade of feature pyramid attention and mixed attention | |
CN111612807B (en) | Small target image segmentation method based on scale and edge information | |
WO2023185243A1 (en) | Expression recognition method based on attention-modulated contextual spatial information | |
CN112288011B (en) | Image matching method based on self-attention deep neural network | |
CN111680695A (en) | Semantic segmentation method based on reverse attention model | |
CN112396607A (en) | Streetscape image semantic segmentation method for deformable convolution fusion enhancement | |
CN113177555B (en) | Target processing method and device based on cross-level, cross-scale and cross-attention mechanism | |
JP7337268B2 (en) | Three-dimensional edge detection method, device, computer program and computer equipment | |
CN110738663A (en) | Double-domain adaptive module pyramid network and unsupervised domain adaptive image segmentation method | |
CN113159232A (en) | Three-dimensional target classification and segmentation method | |
CN115731441A (en) | Target detection and attitude estimation method based on data cross-modal transfer learning | |
CN112001225A (en) | Online multi-target tracking method, system and application | |
CN110930378A (en) | Emphysema image processing method and system based on low data demand | |
CN116563682A (en) | Attention scheme and strip convolution semantic line detection method based on depth Hough network | |
CN110633706B (en) | Semantic segmentation method based on pyramid network | |
CN115457509A (en) | Traffic sign image segmentation algorithm based on improved space-time image convolution | |
Xu et al. | Haar wavelet downsampling: A simple but effective downsampling module for semantic segmentation | |
Gao et al. | Robust lane line segmentation based on group feature enhancement | |
CN116596966A (en) | Segmentation and tracking method based on attention and feature fusion | |
Li et al. | A new algorithm of vehicle license plate location based on convolutional neural network | |
CN115457263A (en) | Lightweight portrait segmentation method based on deep learning | |
CN113344110B (en) | Fuzzy image classification method based on super-resolution reconstruction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |