WO2023207266A1 - 图像配准方法、装置、设备和存储介质 - Google Patents

图像配准方法、装置、设备和存储介质 Download PDF

Info

Publication number
WO2023207266A1
WO2023207266A1 PCT/CN2023/076415 CN2023076415W WO2023207266A1 WO 2023207266 A1 WO2023207266 A1 WO 2023207266A1 CN 2023076415 W CN2023076415 W CN 2023076415W WO 2023207266 A1 WO2023207266 A1 WO 2023207266A1
Authority
WO
WIPO (PCT)
Prior art keywords
displacement
image
moving image
scale
feature
Prior art date
Application number
PCT/CN2023/076415
Other languages
English (en)
French (fr)
Inventor
陈嘉顺
卢东焕
魏东
宁慕楠
施新宇
徐哲
郑冶枫
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023207266A1 publication Critical patent/WO2023207266A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • This application relates to the fields of artificial intelligence and image processing technology, specifically to image registration technology.
  • Image registration as the basis of digital image processing, is used to perform spatial transformation (such as translation, rotation or scale transformation) on two or more images corresponding to the same area collected at different times, different types or different imaging devices.
  • the purpose is to find the optimal spatial transformation (ie, the optimal deformation field or displacement field).
  • Image registration technology can realize image transformation, feature point extraction, matching and other functions, and has important applications in the field of medical image processing.
  • the purpose of medical image registration is to find an optimal spatial transformation to best align basic anatomical structures, thereby helping doctors more easily observe and analyze the growth and changes of organs and lesions from multiple angles, making clinical diagnosis results more reliable precise.
  • embodiments of the present application provide an image registration method, device, equipment and storage medium, which can more effectively learn spatial relationships from image features, thereby achieving accurate image registration.
  • the embodiment of the present application provides an image registration method, including:
  • a displacement field corresponding to the scale is determined based on the respective feature maps of the fixed image and the moving image at the scale, and the displacement field is used to indicate the Mapping from the moving image to the fixed image at the above scale, in the displacement field, the displacement vector of each voxel in the moving image to the corresponding voxel in the fixed image is a plurality of displacement bases
  • a combination of vectors, the plurality of displacement basis vectors are based on the moving image at the scale
  • the feature vector corresponding to the voxel in the feature map is obtained;
  • a final displacement field is generated based on the displacement fields corresponding to the multiple scales, and the moving image is transformed based on the final displacement field, so that the transformed moving image is registered with the fixed image.
  • An embodiment of the present application provides an image registration device, including:
  • An image acquisition module configured to acquire fixed images and moving images to be registered
  • a feature map extraction module configured to obtain feature maps of the fixed image and the moving image at multiple scales
  • the displacement field determination module is configured to, for each of the multiple scales, determine the displacement field corresponding to the scale based on the feature maps of the fixed image and the moving image at the scale, so The displacement field is used to indicate the mapping from the moving image to the fixed image at the scale, in which each voxel in the moving image to the corresponding voxel in the fixed image
  • the displacement vector is a combination of multiple displacement base vectors, and the multiple displacement base vectors are obtained based on the feature vectors corresponding to the voxels in the feature map of the moving image at the scale;;
  • the displacement field synthesis module is configured to generate a final displacement field based on the displacement fields corresponding to the multiple scales, and perform transformation processing on the moving image based on the final displacement field, so that the transformed moving image is consistent with the Fixed image registration.
  • An embodiment of the present application provides a computer device, including: one or more processors; and one or more memories, wherein computer executable programs are stored in the one or more memories.
  • a computer device including: one or more processors; and one or more memories, wherein computer executable programs are stored in the one or more memories.
  • Embodiments of the present application provide a computer-readable storage medium on which computer-executable instructions are stored. When executed by a processor, the instructions are used to implement the image registration method as described above.
  • Embodiments of the present application provide a computer program product or computer program.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the image registration method according to the embodiment of the present disclosure.
  • the method provided by the embodiments of the present application adopts an image registration model based on unsupervised neural network learning, which greatly improves the registration speed and accuracy, and at the same time alleviates the problem of image data The problem of scarcity of set annotation information.
  • the method provided by the embodiments of the present application improves the registration network by promoting mapping from image representations to spatial relationships instead of directly converting image representations into spatial correspondences. Interpretability and reduce the amount of calculation required for registration.
  • the method provided by the embodiment of the present application extracts feature maps of the moving image and the fixed image in the image pair to be registered at multiple scales, and at each scale, based on the corresponding feature map corresponding to each voxel in the moving image eigenvector, convert the displacement vector of each voxel in the moving image into
  • the combination of multiple displacement basis vectors is used to learn the displacement field from paired feature maps and generate the final displacement field based on the displacement fields at all scales, thereby fully utilizing the image feature information at different scales and achieving efficient and accurate image matching. allow.
  • Figure 1 is a schematic diagram showing a scenario of processing a registration request from a user terminal in an embodiment of the present application
  • Figure 2A is a flow chart illustrating an image registration method in an embodiment of the present application
  • Figure 2B is a schematic diagram illustrating the multi-scale registration network of the image registration method in the embodiment of the present application.
  • Figure 2C is a schematic diagram showing the application of the multi-scale registration network in the embodiment of the present application.
  • Figure 3A is a flowchart illustrating a flow chart of determining a displacement field from a feature map of an image pair to be registered in an embodiment of the present application
  • Figure 3B is a schematic diagram illustrating the determination of the displacement field from the feature map of the image pair to be registered in an embodiment of the present application
  • Figure 3C is a schematic flow chart illustrating the determination of the displacement field from the feature map of the image pair to be registered in an embodiment of the present application
  • Figure 4A is a schematic diagram illustrating the cascade fusion of multiple displacement fields determined at multiple scales in an embodiment of the present application
  • Figure 4B is a flow chart showing each level of fusion in the cascade fusion in the embodiment of the present application.
  • Figure 4C is a schematic diagram showing the first level of fusion in the cascade fusion in the embodiment of the present application.
  • Figure 5 is a schematic diagram showing the image registration results under different methods and the visualization results of the deformation field at different scales in the embodiment of the present application;
  • Figure 6 is a schematic diagram showing the application of the image registration method in the embodiment of the present application.
  • Figure 7 is a schematic diagram showing an image registration device in an embodiment of the present application.
  • Figure 8 shows a schematic diagram of the image registration device in the embodiment of the present application.
  • Figure 9 shows a schematic diagram of the architecture of an exemplary computing device in an embodiment of the present application.
  • Figure 10 shows a schematic diagram of a storage medium in an embodiment of the present application.
  • the image registration method provided by the embodiment of the present application can be implemented based on artificial intelligence (Artificial intelligence, AI).
  • artificial intelligence artificial intelligence
  • it can identify the association between the image pairs to be registered with the naked eye and align the image pairs to be registered based on the association.
  • the latent features are extracted from the alignment and the displacement field of the image is determined based on these features to achieve image registration.
  • Artificial intelligence through the study of various intelligent machines
  • the design principles and implementation methods enable the image registration method provided by the embodiment of the present application to quickly and accurately extract various potential features in the image, thereby establishing a spatial mapping relationship between voxel pairs in the image to be registered.
  • the image registration method provided by the embodiments of this application can be implemented based on computer vision (Computer Vision, CV) technology. Based on computer vision technology, the image registration method provided by embodiments of the present application can extract feature information from input images, and achieve accurate image registration by effectively learning spatial relationships from image features.
  • computer vision Computer Vision, CV
  • the image registration method provided by the embodiments of this application can be based on deep learning (deep learning) technology.
  • Deep learning technology has excellent performance in the field of medical image analysis (such as organ or tumor segmentation, lesion detection and disease diagnosis, etc.) with its powerful learning ability and feature extraction ability.
  • image registration study roughly include: the combination of deep networks and traditional registration algorithms, direct registration based on supervised learning models, and direct registration based on unsupervised learning models.
  • the image registration method that combines deep networks with traditional registration algorithms, although the deep network improves the efficiency of registration to a certain extent, it is still within the iterative optimization framework of traditional image registration.
  • the feature extraction process of the deep network is separated from the image registration task.
  • the parameters of the registration model are trained by using the difference loss between the gold standard deformation and the predicted deformation as the objective function, but the performance of the supervised registration model is usually limited by the supervised label information (That is, the accuracy of the gold standard deformation field).
  • the direct registration method based on the unsupervised learning model uses the deformation field output by the registration network to spatially transform the image to be registered to generate a deformed image. Then, the alignment is trained by maximizing the similarity between the deformed image and the reference image.
  • the quasi-network avoids the need for gold standard deformation fields and achieves end-to-end unsupervised training of the registration model. Therefore, in the embodiment of the present application, image registration can be achieved by training an unsupervised registration network based on an unsupervised learning method.
  • the image registration method provided by the embodiment of this application may be based on a neural network.
  • Neural networks such as feedforward networks learn to simulate the hierarchical information processing mechanism of the human brain visual system. By constructing a multi-hidden layer model, they assign different weights to the information in the input image and automatically extract multi-level and multi-angle images. features to better understand the input image.
  • Convolutional Neural Network is a type of feedforward neural network (Feedforward Neural Network, FNN) that contains convolutional calculations and has a deep structure. It is one of the representative algorithms of deep learning. Convolutional neural networks have representation learning capabilities and can perform shift-invariant classification of input information according to its hierarchical structure.
  • the image registration method provided by the embodiment of the present application can learn all the parameters in the registration network based on neural network training for subsequent direct use of the registration network, without the need to perform the registration metric function in each registration. optimization.
  • the image registration method provided by the embodiment of the present application can also be based on the attention mechanism.
  • the essence of the attention mechanism originates from the human visual mechanism and belongs to the brain signal processing mechanism unique to human vision, that is, human attention.
  • the attention mechanism only averages the learned features and has certain limitations.
  • the multi-head attention mechanism is a special variant of the attention mechanism. It uses functions to calculate similarity, and conducts comprehensive learning of features from different aspects of the model through multiple transformations, which is equivalent to multiple single attentions. After the force mechanism learns the features of different subspaces, it splices these features together through linear transformation combination.
  • Multi-head attention mechanisms use multiple queries to select multiple pieces of information from the input information in parallel, where each attention mechanism focuses on different parts of the input information.
  • the multi-head attention mechanism can be used to help the registration model assign different weights to each feature of the input and extract more critical information, thereby obtaining the feature vector used to describe the image from the input image.
  • the multiple displacement basis vectors and their weights of the spatial transformation relationship fully consider the correlation between the images to be registered, and automatically learn the hidden relationships between the features of these images to improve the accuracy of image registration.
  • Figure 1 is a schematic diagram of a scenario for processing a registration request from a user terminal in an embodiment of the present application.
  • a user can initiate an image registration request through his or her user terminal, for example, uploading pairs of images to be registered through a specific interface on the user terminal. Then, the user terminal can respond to the image registration request and transmit the image data to the corresponding server through the network (or directly).
  • the user terminal may specifically include a smartphone, a tablet computer, a laptop computer, a vehicle-mounted terminal, a wearable device, etc.
  • the user terminal may be installed with a browser or a client of various applications (including system applications and third-party applications).
  • the network can be the Internet of Things (Internet of Things) based on the Internet and/or telecommunications network. It can be a wired network or a wireless network. For example, it can be a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN) ), cellular data communication networks and other electronic networks that can realize information exchange functions.
  • the server can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server.
  • the server can perform image registration processing in real time based on received image data (eg, data of image pairs to be registered). Subsequently, the server may return the processed image registration results (eg, registered image pairs and corresponding result data) to the user terminal through the network and display them as a response to the user's image registration request.
  • received image data eg, data of image pairs to be registered
  • the server may return the processed image registration results (eg, registered image pairs and corresponding result data) to the user terminal through the network and display them as a response to the user's image registration request.
  • image registration is a key and difficult task in the field of image processing. It mainly studies alignment and fusion algorithms for images acquired under different conditions of the same object. In daily life, images of the same object often come from different collection devices, taken at different times, from different perspectives, etc. Therefore, these images need to be registered. Specifically, for a pair of two images, it is necessary to find the spatial mapping relationship between the two images through image registration, so as to match the points corresponding to the same position in the two images one by one, and then provide the basis for subsequent recognition. , segmentation, fusion and other operations provide for convenience. In various fields such as computer vision, medical image processing, and material mechanics, image registration algorithms have wide applications and far-reaching impact.
  • medical imaging technology has developed rapidly, gradually developing from static imaging to dynamic imaging, expanding from form to function, and from flat images to three-dimensional images. Since human organs are easily deformed and human anatomy grows and changes in different periods, doctors often need to observe different medical images with the naked eye for registration, which is more difficult for doctors who lack spatial imagination and subjective medical experience.
  • the purpose of medical image registration is to find an optimal spatial transformation to best align basic anatomical structures, thereby helping doctors more easily observe and analyze the growth and changes of organs and lesions from multiple angles, making clinical diagnosis results more reliable precise.
  • Medical image registration converts the coordinates of one image (moving image) into another image (fixed image) so that the corresponding positions in the two images match to obtain a registered image.
  • registration methods can be divided into traditional registration methods and learning-based registration methods.
  • the traditional registration method is based on mathematical iterative optimization, which first defines a similarity index (e.g., L2 norm) and defines the registration task as an optimization problem with various constraints to combine the desired characteristics (such as inverse consistency, symmetry and topological order preservation), continuous iterative optimization through parametric transformation or non-parametric transformation, so that the registered moving image has the highest similarity with the fixed image.
  • a similarity index e.g., L2 norm
  • L2 norm e.g., L2 norm
  • the learning-based registration method refers to the registration method trained through neural networks. In this method, parameters are shared. First, a large amount of data is used to train the model, and then the trained model is used to register new images. As the application of deep learning in medical image analysis research gradually increases, it has achieved very good results in organ segmentation, lesion detection and classification tasks. More and more studies have begun to use deep learning methods to deal with image registration problems. And achieved better results than traditional algorithms. Among them, unsupervised neural networks greatly improve the registration speed and accuracy in the process of image registration, and at the same time alleviate the problem of scarcity of annotation information in image data sets.
  • registration methods based on unsupervised learning usually use the unsupervised medical image registration model (VoxelMorph) as the benchmark and the semantic segmentation network (U-Net) architecture as the main body to learn the dense non-linearity between input image pairs. Linear correspondence.
  • these registration methods based on unsupervised learning cannot obtain the expected registration performance due to factors such as the limited ability of convolutional neural networks to capture spatial relationships, that is, it is difficult to accurately achieve image registration.
  • registration methods that propose using attention mechanisms to better represent spatial relationships. However, because they focus too much on the long-range dependence between images, they ignore the locality of registration, that is, they ignore the need for local areas in the registration task. Find the corresponding voxel.
  • embodiments of the present application provide an image registration method, which is more effective by converting the displacement vector at each voxel in the image into a displacement field that is a combination of multiple displacement base vectors determined based on image features. It can effectively learn spatial relationships from image features to achieve accurate image registration.
  • the method provided by the embodiments of the present application adopts an image registration model based on unsupervised neural network learning, which greatly improves the registration speed and accuracy, and at the same time alleviates the problem of image data The problem of scarcity of set annotation information.
  • the method provided by the embodiments of the present application improves registration by facilitating mapping from image representations to spatial relationships, rather than directly converting image representations into spatial correspondences.
  • the network is interpretable and reduces the amount of computation required for image registration.
  • the method provided by the embodiment of the present application extracts feature maps of the image pair to be registered at multiple scales, and converts the displacement vector of each voxel included in the moving image in the image pair to be registered into multiple displacements at each scale A combination of basis vectors.
  • the combination of multiple displacement basis vectors is determined based on the feature vector corresponding to the voxel in the corresponding feature map to learn the displacement field from the paired feature maps and generate based on the displacement field at all scales.
  • the final displacement field makes full use of image feature information at different scales to achieve efficient and accurate image registration.
  • FIG. 2A is a flowchart illustrating the image registration method 200 provided by an embodiment of the present application.
  • FIG. 2B is a schematic diagram illustrating the multi-scale registration network of the image registration method provided by the embodiment of the present application.
  • FIG. 2C is a schematic diagram showing the application of the multi-scale registration network in the embodiment of the present application. It should be understood that the image configuration method provided by the embodiment of the present application can be executed by a computer device, and the computer device can specifically be a terminal device or a server.
  • step 201 fixed images and moving images to be registered can be obtained.
  • voxel refers to a three-dimensional image, e.g., alternatively, a pixel in a two-dimensional image
  • the coordinates are spatially transformed so that they are aligned with the coordinates of the corresponding voxels in the fixed image, that is, the displacement vector of each voxel in the moving image is determined, thereby determining the displacement field of the moving image.
  • voxel refers to the volume element (Volume Pixel), which is conceptually similar to the smallest unit in two-dimensional image space - pixel. It is the smallest unit in three-dimensional image space.
  • Voxel is usually used in three-dimensional imaging and medical imaging. fields, while pixels are used in image data of two-dimensional computer images.
  • the acquired image pair to be registered can be expressed as Among them, M is the moving image, F is the fixed image, D, W and H represent the depth, width and height of the moving image and the fixed image respectively. Therefore, the image registration process can be expressed as: For each voxel in the moving image M, find the most similar voxel in the fixed image F.
  • the mapping function between the moving image M and the fixed image F can usually be composed of the displacement vector between each pair of voxels.
  • Deformation field ⁇ is represented.
  • the image registration method aims to determine the deformation field of the moving image M By using the spatial transformation network, the moving image M is mapped to the fixed image F according to the determined deformation field ⁇ , so that the deformed moving image It is possible to register to a fixed image F, where the dimensions of the deformation field ⁇ represent the three-dimensional spatial dimensions that correspond to the three-dimensional spatial transformation of each voxel in the moving image M.
  • the deformation field ⁇ can be expressed as Id+u, where Id represents the identity transformation for the moving image M, and u represents the displacement field, which is determined by each pair of bodies in the image pair ⁇ M, F ⁇ to be registered. It consists of displacement vectors between elements.
  • the displacement field size is [W, H, 2] and for a three-dimensional image of size [D, W, H], the displacement field size is [D ,W,H,3], where the displacement field can represent the displacement of each voxel/pixel in various directions (e.g., x, y, z axes).
  • the spatial transformation network can generate a normalized sampling network based on the displacement field, and use this network to sample the moving image to obtain the registered moving image.
  • step 202 feature maps of each of the fixed image and the moving image at multiple scales can be obtained.
  • obtaining the feature maps of the fixed image and the moving image at multiple scales may include: using an encoder based on a convolutional neural network, through different convolutions in the encoder The block extracts feature maps of the fixed image and the moving image at different scales, and obtains feature maps of the fixed image and the moving image at multiple scales.
  • the convolutional neural network extracts the characteristics of the target through layer-by-layer abstraction.
  • the shallow network feature map has high resolution, small receptive field, strong geometric information representation ability, and weak semantic representation ability, and is suitable for processing small targets; deep network It has a large receptive field and strong semantic information representation ability, but the feature map resolution is low and the geometric information representation ability is weak, so it is suitable for processing large targets.
  • low-level detail feature maps capture rich spatial information and highlight the boundaries of organs; while high-level semantic feature maps contain position information and locate the location of organs.
  • the moving image M and the fixed image F can be sent to the same encoder (shared weights) based on the convolutional neural network for feature extraction, so that the encoder can extract a sequence of intermediate features at different scales through different convolution blocks.
  • mapping That is, feature maps at multiple scales are obtained.
  • the encoder can pass different convolution blocks (The numbers of convolution kernels are C 1 , C 2 , C 3 and C 4 respectively).
  • the size of the feature map of the fixed image or the moving image at a specific scale may be the size of the fixed image or the moving image reduced according to a corresponding ratio.
  • the size of the feature map at scale l can be scaled down to 1/2 l of the size of the original image.
  • the size of the feature map can also be reduced in other ways, and the above-mentioned scaling down method is only used as an example and not a limitation in this application.
  • multiple displacement fields can be learned from pairwise feature maps of fixed and moving images at multiple scales. And the displacement fields learned at all scales are fused to combine the characteristics of low-level detail feature maps and high-level semantic feature maps to obtain a more accurate displacement field.
  • a displacement field corresponding to the scale may be determined based on the feature maps of the fixed image and the moving image at the scale, and the displacement Fields are used to indicate the mapping from the moving image to the fixed image at the scale.
  • a displacement field can be determined for the paired feature maps at each scale, where the displacement field consists of the displacement vectors between pairs of voxels in the image pair to be registered.
  • the displacement vector from each voxel in the moving image to the corresponding voxel in the fixed image may be a combination of multiple displacement base vectors, and the multiple displacements
  • the basis vector may be obtained based on the feature vector corresponding to the voxel in the feature map of the moving image at the scale.
  • determining the displacement field corresponding to the scale based on the characteristic maps of the fixed image and the moving image at the scale may include steps as shown in Figure 3A.
  • FIG. 3A is a flowchart illustrating the determination of the displacement field according to the feature map of the image pair to be registered in an embodiment of the present application.
  • FIG. 3B is a schematic diagram illustrating the determination of the displacement field from the feature map of the image pair to be registered in an embodiment of the present application.
  • FIG. 3C is a schematic flow chart illustrating the determination of the displacement field from the feature map of the image pair to be registered in an embodiment of the present application.
  • a plurality of displacement basis vectors may be generated based on the feature vector.
  • the combined representation of the above multiple displacement basis vectors can be obtained through two independent branches using the multi-head attention mechanism as shown in Figure 3B .
  • the plurality of displacement base vectors and their respective weights may be generated based on the feature vectors using K attention heads, wherein: N displacement basis vectors can be generated with each attention head, where K can be an integer greater than 1 and N can be a positive integer.
  • N displacement basis vectors can correspond to the branch in the upper half of Figure 3B, which operates on the feature map of the moving image.
  • linear projection can be used to convert each feature vector in the feature map of the moving image into N (for example, 5 shown in Figure 3B) displacement basis vectors, where each displacement basis vector can be It consists of three elements, which can respectively correspond to displacements in three spatial directions, that is, displacements on the x, y, and z axes.
  • the image registration method provided by the embodiment of this application is based on a multi-head attention mechanism.
  • K attention heads are used for each feature vector, and the number of displacement basis vectors of each attention head is N, thus, K ⁇ N displacement basis vectors are generated for each voxel.
  • the feature map of the moving image M at scale l It can be regarded as including B*D 3 *W 3 *H 3 C 3 -dimensional feature vectors.
  • an N ⁇ 3 dimension is output through a fully connected layer. vector, which can be regarded as N three-dimensional displacement basis vectors. Therefore, through K attention heads, K ⁇ N displacement basis vectors can be extracted for each feature vector.
  • this branch only uses the feature map of the moving image to generate the displacement basis vector, which is enough to extract the most representative displacement basis vector to represent the most likely deformation direction of each voxel.
  • the respective weights of the plurality of displacement basis vectors may be determined for the feature vectors based on respective feature maps of the fixed image and the moving image at the scale.
  • determining respective weights of multiple displacement basis vectors may correspond to the branch in the lower half of FIG. 3B , which operates on the feature maps of the fixed image and the moving image.
  • the attention head is used to calculate a feature map based on the feature map of each of the fixed image and the moving image at the scale. Splicing, the respective weights of the generated N displacement basis vectors.
  • the feature map is obtained by splicing the moving image and the fixed image (shown as C in the figure) Linear projection is performed to learn the attention weights of the N displacement basis vectors corresponding to each attention head.
  • the similarity between the voxels of the moving image and the fixed image can be learned through a linear layer and a normalized exponential function (softmax function), thereby determining the respective attention of the N displacement basis vectors Weight (for example, in Figure 3B, the attention weights corresponding to the above five displacement basis vectors are 0.2, 0.2, 0.4, 0.1 and 0.1 respectively).
  • the displacement vector of the voxel corresponding to the feature vector can be determined.
  • step 2033 for each feature map of the moving image at the scale Characteristic vectors, based on multiple displacement basis vectors corresponding to the characteristic vectors and their weights, determine the displacement field corresponding to the scale.
  • the direction in which each voxel is most likely to deform is determined, and based on the similarity between the moving image and the fixed image, the relative displacement in each direction in which the deformation is most likely to occur is determined, and then based on These directions and the relative displacements in these directions determine the displacement vector of the corresponding voxel.
  • determining the displacement field corresponding to the scale based on multiple displacement basis vectors corresponding to the feature vector and their weights may include: for the K attention heads, based on N displacement basis vectors The average value of the corresponding products of their respective weights determines the displacement field corresponding to the scale; wherein, the displacement vector from each voxel in the moving image to the corresponding voxel in the fixed image can be the plurality of Weighted sum of displacement basis vectors.
  • the combined representation of the multiple displacement basis vectors may be a weighted sum of multiple displacement basis vectors, wherein the weights of these displacement basis vectors are normalized by the attention weights of all displacement basis vectors corresponding to the same feature vector. Determined after unification.
  • Figure 3C shows, in the form of a flow chart, the process of determining the displacement field u l based on the feature maps of the moving image and the fixed image at scale l, with respect to the dimensions of the feature maps of the moving image and the fixed image.
  • the two process flows in the flow diagram respectively correspond to the two independent branches mentioned above.
  • the feature map of the moving image is converted into K N ⁇ 3-dimensional displacement basis vectors through linear projection.
  • the dimension of its feature map is from B*D l *W l *H l *C l changes to B*D l *W l *H l *(K*N*3), and in the process flow on the right, the feature maps of the moving image and the fixed image are spliced, and the dimension of the spliced feature map is B* D l *W l *H l *2C l , the spliced feature map passes through the above line
  • the linear projection is converted into the attention weight of the above K N ⁇ 3-dimensional displacement basis vectors, so the spliced feature map changes to B*D l *W l *H l *(K*N*1).
  • the displacement fields in the K attention heads can be obtained, whose dimensions are B*D l *W l *H l *K* 3. Therefore, by averaging the displacement field across all attention heads as described above, the displacement field u l at scale l can be determined, with dimensions B*3*D l *W l *H l .
  • a final displacement field may be generated based on the displacement fields corresponding to the multiple scales, and the moving image may be transformed based on the final displacement field, so that the transformed moving image is consistent with the fixed image. Registration.
  • a refinement fusion network can be used to fuse these displacement fields to combine the characteristics of low-level detail feature maps and high-level semantic feature maps to obtain more accurate the final displacement field.
  • generating the final displacement field based on the displacement fields corresponding to the multiple scales may include: performing cascade fusion of the displacement fields corresponding to the multiple scales according to the scale size, so as to fuse the displacement fields corresponding to the multiple scales.
  • Each corresponding displacement field of multiple scales is converted into a final displacement field under the original size of the moving image.
  • a cascade fusion network as shown in Figure 4A can be used to generate a final displacement field based on multiple displacement fields.
  • the cascade fusion network repeatedly applies fusion blocks and convolution heads, To convert the displacement field at L scales into the final displacement field at the original resolution.
  • FIG. 4A is a schematic diagram illustrating the cascade fusion of displacement fields corresponding to multiple scales in an embodiment of the present application.
  • FIG. 4B is a flowchart showing each level of fusion in the cascade fusion according to the embodiment of the present application.
  • FIG. 4C is a schematic diagram showing the first level of fusion in the cascade fusion according to the embodiment of the present application.
  • cascade fusion can be performed in order from large to small scale l.
  • the cascade fusion includes L-1 level fusion, in which the output features of the fusion block of the l-th level fusion are The picture is denoted as g l .
  • L-1 level fusion in which the output features of the fusion block of the l-th level fusion are The picture is denoted as g l .
  • the l-th level fusion it can be based on the output g l+1 of the upper-level fusion and the corresponding displacement field of this level of fusion (expressed as B*3*D l *W l *H l ), as well as the moving image. and the respective feature maps of the fixed image and The fusion result g l of this level of fusion is generated through the fusion block.
  • the displacement field determined at 4 scales The final displacement field B*3*D*W*H can be obtained through three levels of fusion.
  • the input and the input to the superior fusion there is no input to the superior fusion, but it also includes feature extraction g L of the displacement field at scale L.
  • cascading the displacement fields corresponding to the multiple scales according to the scale size may include steps 2041-2044 as shown in Figure 4B.
  • step 2041 in each level of fusion in the cascade fusion, the feature map with a specific number of channels output by the upper level fusion can be upsampled to obtain the first feature map.
  • the feature map g l+1 output by the superior fusion can be upsampled to obtain the first feature map.
  • the size is reduced proportionally to 1/2 l of the size of the original image.
  • Bilinear interpolation upsampling can be performed on the output g l+1 of the superior fusion.
  • the dimension of the obtained first feature map is B*3*D l *W l *H l . It should be understood that other data processing methods that can achieve the same effect can also be applied to the method of the embodiment of the present application.
  • the upsampling method is only used as an example and not a limitation in the embodiment of the present application.
  • step 2042 feature extraction can be performed on the displacement field corresponding to this level of fusion, and the displacement field can be converted into a second feature map with the specific number of channels.
  • the predetermined convolution block may contain three convolutional layers (for example, the channel numbers are 16, 64 and 128 respectively, and the scale of each convolutional layer is 1), and may be followed by group normalization and activation functions (e.g., ReLU activation function).
  • convolution blocks of different scales do not share weights. From large to small, their convolution kernel sizes can be set to such as (5,5,5), (5,5,3), (5,3,3), (3,3,3). It should be understood that the specific parameters of the predetermined convolution block can be set according to actual needs. The above specific parameter settings are only used as examples in the embodiments of the present application and are not limitations.
  • the first feature map and the second feature map may be added, and spliced with the feature maps of the fixed image and the moving image respectively at the corresponding scale of this level of fusion.
  • the upper-level fusion result can be combined with the displacement field information and feature map corresponding to the current fusion to generate the fusion result.
  • the result g l+1 of the superior fusion can be added to the feature map h l extracted from the displacement field corresponding to the current fusion (the dimension of the added result is still B*3*D l *W l *H l ).
  • the result of the addition can be compared with the underlying representation and Splicing is performed, and the result of the splicing can be expressed as Among them, upsample( ⁇ ) means upsampling ( ⁇ ). Therefore, by applying the convolution block shown in (b) in Figure 4C to extract feature information on the splicing result, the image information g l of this level of fusion can be obtained.
  • step 2044 feature extraction can be performed on the splicing result to generate a feature map with the specific number of channels as the output of this level of fusion.
  • the last component of the fusion block can be another convolution block, which can reduce the number of channels of the output high-order latent representation (C+2C l ) to a fixed number of channels C.
  • the convolution block may have the same structure and parameters as the predetermined convolution block as mentioned above. It will not be described in detail here, but it should be understood that other ways that can achieve the same purpose or effect can also be applied to the implementation of this application. Image registration method provided in the example.
  • performing cascade fusion on the displacement fields corresponding to the multiple scales according to the scale size may also include: for the first-level fusion output in the cascade fusion having Feature extraction is performed on the feature map with a specific number of channels to generate the final displacement field.
  • a similar convolutional head e.g., a convolutional head containing three decoding blocks, each with two consecutive structures, containing one convolutional layer
  • an activation function e.g., ReLU activation function
  • the intermediate displacement fields determined at multiple scales are fused by using a refined fusion network as described with reference to Figures 4A to 4C to generate a final displacement field to Based on the characteristics of low-level detail feature maps and high-level semantic feature maps, combined with displacement field information and multi-scale feature maps, it provides high-resolution large deformation mapping capabilities.
  • the image registration method as described above with reference to steps 201 to 204 can be represented as a schematic flow chart as shown in FIG. 2C.
  • the deformation field of the moving image in the image pair to be registered can be obtained. Therefore, by applying the obtained deformation field to the moving image , the final registration image can be output.
  • the training of the above-mentioned multi-scale registration network may be based on the optimization of the registration loss function.
  • the optimal parameters of the multi-scale registration network can be determined.
  • the determined optimal parameters can be directly applied in subsequent image registration tasks without re-optimizing calculations for real-time tasks each time.
  • Parameters involved in operations such as convolution in the image registration method of the embodiment of the present application, All can be determined in advance by optimizing the registration loss function.
  • the generation of the final displacement field may be based on the optimization of a registration loss function
  • the registration loss function may include a method for measuring the transformed moving image (i.e., based on the final displacement field
  • the first term is the similarity between the moving image (obtained by transforming the moving image) and the fixed image
  • the second term is used to impose a penalty on the displacement field to penalize the local spatial variation of the moving image.
  • the image registration method provided by the embodiment of this application can use most registration loss functions, such as the following registration loss function:
  • is the regularization trade-off parameter.
  • the first item The similarity between the transformed moving image and the fixed image is measured through local normalized cross-correlation. This term can be expressed as:
  • is the entire voxel set
  • P represents the neighboring voxels of voxel p.
  • ⁇ F(q) can be defined respectively as and to eliminate the influence of absolute gray value, so as to better Reflect the structural differences between the image pairs to be registered.
  • the second item can be a regularization applied to the deformation field to penalize local spatial changes in the moving image.
  • This term can be expressed as:
  • the image registration method provided by the embodiment of the present application provides direct guidance for the determination of the displacement field and the feature extractor at each scale by imposing a penalty on the displacement field at each scale.
  • the displacement field determined at each scale needs to be sampled to the original size to deform the moving image.
  • the total registration loss function used in the image registration method provided by the embodiment of the present application can be expressed as:
  • ⁇ l represents the intermediate registration loss function at scale l the weight of.
  • FIG. 5 is a schematic diagram showing the image registration results under different methods and the visualization results of the deformation field at different scales in the embodiment of the present application. Among them, (a) in Figure 5 shows a comparison of medical image registration results using different methods, and (b) in Figure 5 shows a comparison of medical image registration results at different scales using the image registration method provided by the embodiment of the present application. Visualization results of the deformation field.
  • the image registration results shown in (a) and (b) in Figure 5 are the registration results of magnetic resonance imaging (MRI) images of the brain, where curves with different depths represent the boundaries of different brain structures (for example, MRI images).
  • MRI magnetic resonance imaging
  • the two symmetrically distributed closed curves at the bottom correspond to the boundaries of the caudate nucleus.
  • the image registration method in the embodiment of the present application can better retain the detailed information of the medical image.
  • the moving image after deformation (marked as "the present disclosure” in the figure) can better match the segmentation annotation of different brain structures.
  • the different colors of the deformation field in (a) and (b) in Figure 5 represent the degree of deformation, that is, the displacement of the voxel point.
  • the darker the color, the greater the displacement, and the curve in the deformation field The direction of change indicates the displacement direction of the voxel point.
  • the obtained deformation field gradually becomes blurred and the resolution gradually decreases.
  • FIG. 6 is a schematic diagram showing the application of the image registration method according to the embodiment of the present application.
  • the image registration method provided by the embodiment of the present application can be directly applied as a model.
  • the user can upload pairs of images to be registered (for example, on the front end A), and after receiving the image pairs to be registered, the backend can use the implementation based on this application
  • the model trained by the image registration method provided in the example directly registers the image pair to be registered (backend), and outputs the registration result (for example, on the frontend B).
  • the image registration method provided by the embodiment of the present application can also be deployed through the intelligent open laboratory platform.
  • the user can drag the relevant module (for example, image registration module) into the display interface. , and directly use the image registration method and output the registered image.
  • relevant module for example, image registration module
  • FIG. 7 is a schematic diagram showing an image registration device 700 in an embodiment of the present application.
  • the image registration device 700 may include an image acquisition module 701, a feature map extraction module 702, a displacement field determination module 703, and a displacement field synthesis module 704.
  • the image acquisition module 701 may be configured to acquire fixed images and moving images to be registered.
  • the feature map extraction module 702 may be configured to obtain feature maps of each of the fixed image and the moving image at multiple scales.
  • the displacement field determination module 703 may be configured to, for each of the multiple scales, determine the displacement field corresponding to the scale based on the respective feature maps of the fixed image and the moving image at the scale,
  • the displacement field is used to indicate the mapping from the moving image to the fixed image at the scale, in which each voxel in the moving image is mapped to the corresponding volume in the fixed image.
  • the displacement vector of a voxel is a combination of multiple displacement basis vectors, and the multiple displacement basis vectors are obtained based on the feature vector corresponding to the voxel in the feature map of the moving image at the scale.
  • the displacement field synthesis module 704 may be configured to generate a final displacement field based on the plurality of displacement fields determined at the plurality of scales, and perform transformation processing on the moving image based on the final displacement field, so that the transformed motion The image is registered with the fixed image.
  • the displacement field determination module 703 may be specifically configured as:
  • a plurality of displacement basis vectors corresponding to the feature vector are generated based on the feature vector; based on the fixed image and the moving image respectively Determine the respective weights of the multiple displacement basis vectors based on the feature map at the scale;
  • a displacement field corresponding to the scale is determined based on a plurality of displacement basis vectors corresponding to the feature vector and their weights.
  • the plurality of displacement basis vectors and their respective weights are generated based on the feature vectors using K attention heads, where K is an integer greater than 1;
  • each attention head is used to generate N displacement basis vectors; N is an integer greater than or equal to 1;
  • the displacement field determination module 703 may be specifically configured as:
  • the displacement vector from each voxel in the moving image to the corresponding voxel in the fixed image is a weighted sum of the plurality of displacement base vectors.
  • the feature map extraction module 702 may be specifically configured as:
  • the feature maps of the fixed image and the moving image at different scales are extracted through different convolution blocks in the encoder to obtain the fixed image and the moving image.
  • the size of the feature map of the fixed image or the moving image at a specific scale is the size of the fixed image or the moving image reduced according to a corresponding ratio.
  • the displacement field synthesis module 704 can be specifically configured as:
  • the displacement fields corresponding to the multiple scales are cascaded and fused to convert the displacement fields corresponding to the multiple scales into a final displacement field under the original size of the moving image.
  • the displacement field synthesis module 704 can be specifically configured as:
  • the feature map with a specific number of channels output by the upper level fusion is upsampled to obtain the first feature map
  • Feature extraction is performed on the splicing result to generate a feature map with the specific number of channels as the output of this level of fusion.
  • the displacement field synthesis module 704 can be specifically configured as:
  • Feature extraction is performed on the feature map with the specific number of channels output by the first-level fusion in the cascade fusion to generate the final displacement field.
  • the generation of the final displacement field is based on the optimization of a registration loss function, which includes a third parameter used to measure the similarity between the transformed moving image and the fixed image. one term, and a second term that imposes a penalty on the displacement field to penalize local spatial variations of the moving image.
  • an image registration device is also provided, and the image registration device is a computer device.
  • Figure 8 shows a schematic diagram of the image registration device 2000 in the embodiment of the present application.
  • the image registration device 2000 may include one or more processors 2010, and one or more memories 2020.
  • the memory 2020 stores computer readable code, and when the computer readable code is run by the one or more processors 2010, the image registration method as described above can be executed.
  • the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capabilities.
  • the above-mentioned processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed.
  • the general-purpose processor can be a microprocessor or the processor can be any conventional processor, etc., which can be of X86 architecture or ARM architecture.
  • the various example embodiments of the present application may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device. While aspects of embodiments of the present disclosure are illustrated or described as block diagrams, flowcharts, or using some other graphical representation, it will be understood that the blocks, devices, systems, techniques, or methods described herein may be used as non-limiting Examples are implemented in hardware, software, firmware, special purpose circuitry or logic, general purpose hardware or controllers, or other computing devices, or some combination thereof.
  • computing device 3000 may include a bus 3010, one or more CPUs 3020, read only memory (ROM) 3030, random access memory (RAM) 3040, communication port 3050 connected to a network, input/output components 3060, hard disk 3070, etc.
  • the storage device in the computing device 3000 such as the ROM 3030 or the hard disk 3070, can store various data or files used for processing and/or communication of the image registration method provided by the present disclosure, as well as program instructions executed by the CPU.
  • Computing device 3000 may also include user interface 3080.
  • the architecture shown in FIG. 8 is only exemplary, and when implementing different devices, one or more components in the computing device shown in FIG. 9 may be omitted according to actual needs.
  • Figure 10 shows a schematic diagram 4000 of a storage medium in an embodiment of the present application.
  • Non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM dynamic random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDRSDRAM double data rate synchronous dynamic random access storage
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronously connected dynamic random access memory
  • DR RAM direct memory bus random access memory
  • Embodiments of the present application also provide a computer program product or computer program.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image registration method of the embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种图像配准方法、装置、设备和计算机可读存储介质。该方法通过在多个尺度下提取待配准图像对中移动图像和固定图像的特征图,在每个尺度下,基于相应特征图中与移动图像中的每个体素相对应的特征向量,将移动图像中的每个体素的位移向量转换为多个位移基向量的组合,以从成对的特征图中学习位移场,并基于所有尺度下的位移场生成最终位移场,从而充分利用不同尺度下的图像特征信息实现高效准确的图像配准。通过本申请实施例的方法能够基于不同尺度学习从粗到细的位移场,以通过对所有尺度下的位移场进行融合,在不引入密集的连续网络计算的情况下,充分利用不同表示子空间的潜在信息。

Description

图像配准方法、装置、设备和存储介质
本申请要求于2022年04月29日提交中国专利局、申请号为2022104789745、申请名称为“图像配准方法、装置、设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能及图像处理技术领域,具体涉及图像配准技术。
背景技术
图像配准作为数字图像处理的基础,用于对不同时间、不同类型或不同成像设备采集的两幅或多幅对应于相同区域的图像进行空间变换(诸如平移、旋转或尺度变换等操作),使之达成一致,其目的是寻找最优空间变换(即最优形变场或位移场)。图像配准技术可以实现图像变换和特征点提取、匹配等功能,在医学影像处理领域具有重要应用。医学图像配准的目的是找到一个最佳的空间变换,最好地对齐基础解剖结构,从而帮助医生更容易地从多个角度观察、分析器官和病灶的生长变化情况,使临床诊断结果更加可靠准确。
传统的非基于学习的图像配准方法将配准任务定义为一个具有各种约束条件的优化问题,以结合期望的特性,诸如逆一致性、对称性和拓扑保序。但是这些方法需要对每对配准图像进行迭代式的密集型计算,优化速度慢、且实现困难。近年来,越来越多的研究开始用深度学习方法处理图像配准问题,获得了优于传统算法的效果。但是,现有的基于无监督学习的图像配准方法由于卷积神经网络在捕捉空间关系方面的能力有限等原因,无法获得期望的配准性能,即难以准确地实现图像配准。
发明内容
为了解决上述问题,本申请实施例提供了一种图像配准方法、装置、设备和存储介质,能够更有效地从图像特征中学习空间关系,从而实现准确的图像配准。
本申请实施例提供了一种图像配准方法,包括:
获取待配准的固定图像和移动图像;
获得所述固定图像和所述移动图像各自在多个尺度下的特征图;
对于所述多个尺度中的每个尺度,根据所述固定图像和所述移动图像在所述尺度下各自的特征图,确定所述尺度对应的位移场,所述位移场用于指示在所述尺度下从所述移动图像到所述固定图像的映射,在所述位移场中,所述移动图像中的每个体素到所述固定图像中的相应体素的位移向量为多个位移基向量的组合,所述多个位移基向量是基于所述移动图像在所述尺度下 的特征图中与所述体素对应的特征向量获得的;
基于所述多个尺度各自对应的位移场生成最终位移场,并基于所述最终位移场对所述移动图像进行变换处理,以使变换后的移动图像与所述固定图像配准。
本申请实施例提供了一种图像配准装置,包括:
图像获取模块,被配置为获取待配准的固定图像和移动图像;
特征图提取模块,被配置为获得所述固定图像和所述移动图像各自在多个尺度下的特征图;
位移场确定模块,被配置为对于所述多个尺度中的每个尺度,根据所述固定图像和所述移动图像各自在所述尺度下的特征图,确定所述尺度对应的位移场,所述位移场用于指示在所述尺度下从所述移动图像到所述固定图像的映射,在所述位移场中,所述移动图像中的每个体素到所述固定图像中的相应体素的位移向量为多个位移基向量的组合,所述多个位移基向量是基于所述移动图像在所述尺度下的特征图中与所述体素对应的特征向量获得的;;
位移场合成模块,被配置为基于所述多个尺度各自对应的位移场生成最终位移场,并基于所述最终位移场对所述移动图像进行变换处理,以使变换后的移动图像与所述固定图像配准。
本申请实施例提供了一种计算机设备,包括:一个或多个处理器;以及一个或多个存储器,其中,所述一个或多个存储器中存储有计算机可执行程序,当由所述处理器执行所述计算机可执行程序时,执行如上所述的图像配准方法。
本申请实施例提供了一种计算机可读存储介质,其上存储有计算机可执行指令,所述指令在被处理器执行时用于实现如上所述的图像配准方法。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行根据本公开的实施例的图像配准方法。
本申请实施例提供的方法相比于传统的图像配准方法而言,通过采用基于无监督神经网络学习的图像配准模型,极大地提高了配准速度和配准精度,同时缓解了图像数据集标注信息稀缺的问题。此外,与现有的基于无监督学习的配准方法相比,本申请实施例提供的方法通过促进从图像表示到空间关系的映射,而不是将图像表示直接转换为空间对应,提高配准网络的可解释性,且减小了配准所需的计算量。
本申请实施例提供的方法,通过在多个尺度下提取待配准图像对中移动图像和固定图像的特征图,在每个尺度下,基于相应特征图中与移动图像中的每个体素相对应的特征向量,将移动图像中的每个体素的位移向量转换为 多个位移基向量的组合,以从成对的特征图中学习位移场,并基于所有尺度下的位移场生成最终位移场,从而充分利用不同尺度下的图像特征信息,实现高效准确的图像配准。通过本申请实施例的方法能够基于不同尺度学习从粗到细的位移场,并通过对所有尺度下的位移场进行融合,在不引入密集的连续网络计算的情况下,充分利用不同表示子空间的潜在信息。
附图说明
图1是示出本申请实施例中的对来自用户终端的配准请求进行处理的场景示意图;
图2A是示出本申请实施例中的图像配准方法的流程图;
图2B是示出本申请实施例中的图像配准方法的多尺度配准网络的示意图;
图2C是示出本申请实施例中的多尺度配准网络的应用的示意图;
图3A是示出本申请实施例中的从待配准图像对的特征图中确定位移场的流程图;
图3B是示出本申请实施例中的从待配准图像对的特征图中确定位移场的示意图;
图3C是示出本申请实施例中的从待配准图像对的特征图中确定位移场的示意性流程框图;
图4A是示出本申请实施例中的对多个尺度下确定的多个位移场的级联融合的示意图;
图4B是示出本申请实施例中的级联融合中的每级融合的流程图;
图4C是示出本申请实施例中的级联融合中的第l级融合的示意图;
图5是示出本申请实施例中的不同方法下的图像配准结果及不同尺度下的形变场的可视化结果的示意图;
图6是示出本申请实施例中的图像配准方法的应用的示意图;
图7是示出本申请实施例中的图像配准装置的示意图;
图8示出了本申请实施例中的图像配准设备的示意图;
图9示出了本申请实施例中的示例性计算设备的架构的示意图;
图10示出了本申请实施例中的存储介质的示意图。
具体实施方式
为便于描述本申请,以下介绍与本申请有关的概念。
本申请实施例提供的图像配准方法可以基于人工智能(Artificial intelligence,AI)实现。对于基于人工智能的图像配准方法而言,其能够以类似于人类通过肉眼识别待配准图像对之间的关联,并基于该关联对待配准图像对进行对齐的方式,从待配准图像对中提取潜在特征,并基于这些特征来确定图像的位移场,从而实现图像配准。人工智能通过研究各种智能机器 的设计原理与实现方法,使本申请实施例提供的图像配准方法能够快速准确地提取图像中的各种潜在特征,从而建立待配准图像对中体素对之间的空间映射关系。
本申请实施例提供的图像配准方法可以基于计算机视觉(Computer Vision,CV)技术实现。基于计算机视觉技术,本申请实施例提供的图像配准方法可以从输入的图像中提取特征信息,通过从图像特征中有效学习空间关系,实现准确的图像配准。
本申请实施例提供的图像配准方法可以基于深度学习(deep learning)技术。深度学习技术以其强大的学习能力和特征提取能力在医学图像分析领域(诸如器官或肿瘤分割、病灶检测和疾病诊断等)有出色的表现,近年来,一些研究者开展了基于深度学习的医学图像配准研究。这些研究大致包括:深度网络与传统配准算法相结合、基于有监督学习模型的直接配准以及基于无监督学习模型的直接配准。对于深度网络与传统配准算法相结合的图像配准方法,尽管深度网络一定程度地提升了配准的效率,但仍然是在传统图像配准的迭代优化框架中。此外,深度网络的特征提取过程与图像配准任务是分离的,因此,不能确保网络提取出的特征对于配准问题具有适配的描述性。在基于有监督学习的配准模型研究中,通过使用金标准形变与预测形变的差异损失作为目标函数来训练配准模型的参数,但有监督配准模型的性能通常受限于监督标签信息(即金标准形变场)的准确度,对于临床医学图像,训练数据中金标准形变场的获取是非常困难且十分耗时的。基于无监督学习模型的直接配准方法,利用配准网络输出的形变场对待配准图像进行空间变换,产生形变后的图像,然后,通过最大化形变图像和参考图像间的相似性来训练配准网络,避免了对金标准形变场的需求,实现了端到端的无监督训练配准模型。因此,在本申请实施例中,可以基于无监督学习方法,通过训练无监督配准网络来实现图像配准。
可选地,本申请实施例提供的图像配准方法可以基于神经网络。神经网络(例如前馈网络)是学习模拟人脑视觉系统分级处理信息的机制,通过构建多隐含层的模型,为输入图像中的信息分配不同权值,自动提取多层次、多角度的图像特征,从而更好地理解输入图像。卷积神经网络(Convolutional Neural Network,CNN)是一类包含卷积计算且具有深度结构的前馈神经网络(Feedforward Neural Network,FNN),其是深度学习的代表算法之一。卷积神经网络具有表征学习(representation learning)能力,能够按其阶层结构对输入信息进行平移不变分类(shift-invariant classification)。本申请实施例提供的图像配准方法可以基于神经网络训练学习到配准网络中的所有参数,以用于后续对配准网络的直接使用,无需在每次配准时都对配准度量函数进行优化。
可选地,本申请实施例提供的图像配准方法还可以基于注意力机制。注 意力机制的本质源于人类视觉机制,属于人类视觉特有的大脑信号处理机制,即人类的注意力。注意力机制只是将学习到的特征进行平均处理,存在一定的局限性。多头注意力机制(multi-head attention)是注意力机制的一种特殊变形,其利用函数计算相似度,通过多次变换,从模型中的不同方面进行特征的全面学习,相当于多个单个注意力机制学习不同子空间的特征后,通过线性变换组合将这些特征拼接起来。多头注意力机制利用多个查询,来平行地从输入信息中选取多个信息,其中,每个注意力机制关注输入信息的不同部分。在本申请实施例中,可以利用多头注意力机制,帮助配准模型对输入的每个特征赋予不同的权重,抽取出更加关键的信息,从而,从输入图像的特征向量获取用于描述图像的空间变换关系的多个位移基向量及其权重,充分考虑待配准图像之间的关联,自动学习这些图像的特征之间的隐藏关系,以提高图像配准的准确性。
综上所述,本申请实施例提供的方案涉及人工智能、无监督图像配准、多头注意力机制等技术,下面将结合附图对本申请实施例进行进一步地描述。
图1是本申请实施例中对来自用户终端的配准请求进行处理的场景示意图。
在图1中,用户可以通过其用户终端发起图像配准请求,例如,通过用户终端上的特定接口上传成对的待配准图像。接着,用户终端可以响应该图像配准请求,通过网络(或者直接地)向对应的服务器传输这些图像数据。
可选地,用户终端具体可以包括智能手机、平板电脑、膝上型便携计算机、车载终端、可穿戴设备等等。用户终端可以安装有浏览器或各种应用(包括系统应用及第三方应用)的客户端。网络可以是基于互联网和/或电信网的物联网(Internet of Things),其可以是有线网也可以是无线网,例如,其可以是局域网(LAN)、城域网(MAN)、广域网(WAN)、蜂窝数据通信网络等能实现信息交换功能的电子网络。服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是云服务器。
如图1所示,服务器可以基于接收的图像数据(例如,待配准图像对的数据)实时进行图像配准处理。随后,服务器可将经处理得到的图像配准结果(例如,经配准的图像对及相应结果数据)通过网络返回到用户终端并进行显示,作为对用户的图像配准请求的响应。
如今,图像配准是图像处理领域的一个重点、难点任务,它主要研究针对同一对象在不同条件下获取的图像的对齐和融合算法。日常生活中,针对同一对象获得的图像经常来自于不同的采集设备,拍摄于不同的时间,不同的视角等。因此,需要对这些图像进行配准。具体地,对于成对的两幅图像,需要通过图像配准寻找这两幅图像之间的空间映射关系,从而将两图中对应于同一位置的点一一对应起来,进而,为后续的识别、分割、融合等操作提 供便利。在计算机视觉、医学图像处理、以及材料力学等各种领域中,图像配准算法都有广泛的应用和深远的影响。例如,近年来,医学成像技术发展迅速,逐渐从静态成像发展到动态成像,从形态拓展到功能,从平面图像到三维立体图像。由于人体器官易变形,不同时期的人体解剖结构存在生长变化,医生往往需要通过肉眼观察不同的医学图像来配准,这对于缺乏空间想象力和主观医疗经验的医生较为困难。医学图像配准的目的是找到一个最佳的空间变换,最好地对齐基础解剖结构,从而帮助医生更容易地从多个角度观察、分析器官和病灶的生长变化情况,使得临床诊断结果更加可靠准确。医学图像配准通过将一幅图像(移动图像)的坐标转换到另一幅图像(固定图像)中,使得两幅图像中的相应位置匹配,得到配准图像。
根据配准方法依赖的算法不同,可以将配准方法分为传统的配准方法和基于学习的配准方法。传统的配准方法是基于数学迭代优化的方法,其首先定义一个相似性指标(例如,L2范数),将配准任务定义为一个具有各种约束条件的优化问题,以结合期望的特性(诸如逆一致性、对称性和拓扑保序),通过参数化转换或非参数化转换进行不断迭代优化,使得配准后的移动图像与固定图像相似性最高。但是,这样的配准方法需要每次针对新的配准图像对进行迭代式的密集型计算,其优化速度慢且实现困难。
相较于传统的配准方法,基于学习的配准方法具有很大的优势与潜力,因此,有越来越多的研究人员研究该方法,近几年来有不少相关的工作发表。基于学习的配准方法是指通过神经网络训练的配准方法,该方法中参数是共享的,首先利用大量的数据来训练模型,然后用训练好的模型对新的图像进行配准。随着深度学习在医学图像分析研究中的应用逐渐增多,其在器官分割、病灶检测与分类任务中取得了相当好的效果,越来越多的研究开始用深度学习方法处理图像配准问题,并获得了优于传统算法的效果。其中,无监督神经网络在图像配准的过程中极大地提高了配准速度和配准精度,同时缓解了图像数据集标注信息稀缺的问题。近年的基于无监督学习的配准方法中,通常以无监督医学图像配准模型(VoxelMorph)作为基准,以语义分割网络(U-Net)体系结构为主体,学习输入图像对之间的密集非线性对应关系。但这些基于无监督学习的配准方法由于卷积神经网络在捕捉空间关系方面的能力有限等因素,无法获得期望的配准性能,即难以准确地实现图像配准。此外,还存在配准方法提出采用注意机制来更好地表征空间关系,但由于其过于关注图像间的长程依赖,忽视了配准的局部性,即忽略了在配准任务中需要在局部区域找到相应的体素。
本申请实施例为了解决上述问题,提供了一种图像配准方法,其通过将图像中每个体素处的位移向量转换为基于图像特征确定的多个位移基向量的组合的位移场,更有效地从图像特征中学习到空间关系,从而实现准确的图像配准。
本申请实施例提供的方法相比于传统的图像配准方法而言,通过采用基于无监督神经网络学习的图像配准模型,极大地提高了配准速度和配准精度,同时缓解了图像数据集标注信息稀缺的问题。此外,与现有的基于无监督学习的配准方法相比,本申请实施例提供的方法通过促进从图像表示到空间关系的映射,而不是将图像表示直接转换为空间对应,提高了配准网络的可解释性,且减小了图像配准所需的计算量。
本申请实施例提供的方法,在多个尺度下提取待配准图像对的特征图,在每个尺度下,将待配准图像对中移动图像包括的每个体素的位移向量转换为多个位移基向量的组合,该多个位移基向量的组合是基于相应特征图中与该体素对应的特征向量确定的,以从成对特征图中学习位移场,并基于所有尺度下的位移场生成最终位移场,如此充分利用不同尺度下的图像特征信息实现高效准确的图像配准。通过本申请实施例的方法能够基于不同尺度学习从粗到细的位移场,通过对所有尺度下的位移场进行融合,在不引入密集的连续网络计算的情况下,充分利用不同表示子空间的潜在信息。
图2A是示出本申请实施例提供的图像配准方法200的流程图。图2B是示出本申请实施例提供的图像配准方法的多尺度配准网络的示意图。图2C是示出本申请实施例中的多尺度配准网络的应用的示意图。应理解,本申请实施例提供的图像配置方法可以由计算机设备执行,该计算机设备具体可以为终端设备或服务器。
在步骤201中,可以获取待配准的固定图像和移动图像。
如上所述,图像配准的目的在于,对待配准图像对中的移动图像中的每个体素(此处的体素对应于三维图像,例如,替代地,在二维图像中为像素)的坐标进行空间变换,以使其与固定图像中的对应体素的坐标对齐,即确定移动图像中的每个体素的位移向量,从而确定移动图像的位移场。其中,体素是指体积元素(Volume Pixel),其在概念上类似于二维图像空间中的最小单位——像素,是三维图像空间的最小单位,体素通常用于三维成像与医学影像等领域,而像素用在二维计算机图像的影像数据上。
对于如上所述的诸如医学图像配准的场景,考虑到随着成像技术从平面图像到三维图像的发展,对图像的处理也需要适应于所获得的三维图像。因此,在本申请的后文中,主要针对三维图像的图像配准场景进行描述,但这并不意味着对本申请实施例提供的图像配准方法的适用范围的限制,本申请实施例提供的图像配准方法同样可以适用于其他更高或更低维度的图像处理,以下对于三维图像的操作仅用作示例而非限制。
可选地,在本申请实施例中,针对三维医学图像配准场景(其中移动图像和固定图像为三维医学图像),可以将所获取的待配准图像对表示为其中,M为移动图像,F为固定图像,D、W和H分别表示移动图像和固定图像的深度、宽度和高度。因此,图像配准过程可以表示为: 针对移动图像M中的每个体素,在固定图像F中找到最相似的体素,移动图像M与固定图像F之间的映射函数通常可以用由其中每对体素之间的位移向量构成的形变场φ来表示。
因此,本申请实施例提供的图像配准方法旨在确定移动图像M的形变场通过采用空间变换网络,根据所确定的形变场φ,将移动图像M映射到固定图像F上,使得形变后的移动图像可以配准至固定图像F,其中,形变场φ的维度表示三维空间维度,其对应于移动图像M中的每个体素的三维空间变换。具体地,形变场φ可以被表示为Id+u,其中,Id代表对于移动图像M的恒等变换,u表示位移场,其是由待配准图像对{M,F}中的每对体素之间的位移向量构成的。例如,对于大小为[W,H]的二维图像,其位移场大小为[W,H,2],而对于大小为[D,W,H]的三维图像,其位移场大小为[D,W,H,3],其中,位移场可以表示每个体素/像素在各个方向(例如,x、y、z轴)上的位移。空间变换网络可以根据位移场生成归一化后的采样网络,利用该网络对移动图像进行采样,可以获得配准后的移动图像。
在步骤202中,可以获得所述固定图像和所述移动图像各自在多个尺度下的特征图。
考虑到对于同一图像,可以从其不同尺度下的特征图获得不同的图像信息。在本申请实施例中,获得所述固定图像和所述移动图像各自在多个尺度下的特征图,可以包括:利用基于卷积神经网络的编码器,通过所述编码器中的不同卷积块提取固定图像和移动图像各自在不同尺度下的特征图,获得所述固定图像和所述移动图像各自在多个尺度下的特征图。
卷积神经网络通过逐层抽象的方式来提取目标的特征,其中,浅层网络特征图分辨率高,感受野较小,几何信息表征能力强,语义表征能力弱,适合处理小目标;深层网络感受野大,语义信息表征能力强,但是特征图分辨率低,几何信息表征能力弱,适合处理大目标。对于三维医学图像,低层次的细节特征图捕获了丰富的空间信息,突出了器官的边界;而高层次的语义特征图包含了位置信息,定位了器官所在的位置。因此,移动图像M和固定图像F可以被送入基于卷积神经网络的同一编码器(共享权重)进行特征提取,使得编码器可以通过不同的卷积块提取出一序列不同尺度下的中间特征映射即获得多个尺度下的特征图。例如,图2B中示出了四个尺度(尺度数量L=4)的情况,其中,B表示该批待配准图像对的数量,如图2B所示,编码器可以通过不同的卷积块(卷积核个数分别为C1、C2、C3和C4),基于移动图像M和固定图像F分别提取四个尺度下的特征图(对于移动图像M为并且对于固定图像F为),其中,对于同一尺度l,移动图像和固定图像的特征图的维度相同,例如对于尺度l=3,的维度都为B*D3*W3*H3*C3
应当理解,本申请实施例中仅以尺度为4的情况作为示例来描述图像配 准过程,但这并不是对于尺度数量的限制,本申请实施例提供的图像配准方法同样可以采用其他数量的尺度实现图像配准。
根据本申请实施例,所述固定图像或所述移动图像在特定尺度下的特征图的尺寸,可以为按照相应比例对所述固定图像或所述移动图像进行缩小后的尺寸。可选地,为了增加感受野,尺度l下的特征图的尺寸可以按比例缩小到原始图像的尺寸的1/2l。例如,对于上述L=4的情况,尺度l=3下的移动图像的特征图的尺寸可以移动图像的原始尺寸的1/8,即D3*W3*H3=(D*W*H)/8。当然,对于特征图的尺寸的缩小还可以基于其他方式,上述按比例缩小的方式在本申请中仅用作示例而非限制。
因此,可以从固定图像和移动图像在多个尺度下的成对特征图中学习多个位移场并且对在所有尺度下学习的位移场进行融合,以结合低层次的细节特征图和高层次的语义特征图的特性,获得更准确的位移场。
在步骤203中,可以对于所述多个尺度中的每个尺度,根据所述固定图像和所述移动图像各自在所述尺度下的特征图,确定所述尺度对应的位移场,所述位移场用于指示在所述尺度下从所述移动图像到所述固定图像的映射。
如上所述,可以针对每个尺度下的成对特征图确定一个位移场,其中,该位移场由待配准图像对中的体素对间的位移向量构成。根据本申请实施例,在所述位移场中,所述移动图像中的每个体素到所述固定图像中的相应体素的位移向量可以为多个位移基向量的组合,所述多个位移基向量可以是基于所述移动图像在所述尺度下的特征图中与所述体素对应的特征向量获得的。
与将图像表示直接转换为空间对应的方法不同,为了促进从图像表示到空间关系的映射,并且提高图像配准网络中的空间转换的可解释性,在本申请实施例提供的图像配准方法中,对于如图2B所示的位移场确定部分,根据线性代数理论,将两个体素之间的位移向量分解为多个位移基向量的组合表示,从而避免复杂且可解释性差的空间对应直接转换。根据本申请实施例,根据所述固定图像和所述移动图像各自在所述尺度下的特征图,确定所述尺度对应的位移场,可以包括如图3A所示的步骤。
图3A是示出本申请实施例中根据待配准图像对的特征图确定位移场的流程图。图3B是示出本申请实施例中从待配准图像对的特征图中确定位移场的示意图。图3C是示出本申请实施例中从待配准图像对的特征图中确定位移场的示意性流程框图。
如图3A所示,在步骤2031中,可以对于所述移动图像在所述尺度下的特征图中的每个特征向量,基于所述特征向量生成多个位移基向量。
可选地,对于同一尺度下的移动图像的特征图和固定图像的特征图,上述多个位移基向量的组合表示,可以通过图3B所示的采用多头注意力机制的两个独立分支来获得。具体地,在本申请实施例中,所述多个位移基向量及其各自的权重可以是利用K个注意力头基于所述特征向量生成的,其中,利 用每个注意力头可以生成N个位移基向量,其中,K可以是大于1的整数,N可以是正整数。
如图3B所示,利用每个注意力头生成N个位移基向量,可以对应于图3B中的上半部分的分支,该分支针对移动图像的特征图进行操作。在该分支中,可以采用线性投影,将移动图像的特征图中的每个特征向量转换为N个(例如,图3B中示为5个)位移基向量,其中,每个位移基向量可以由三个元素组成,这三个元素可以分别对应于三个空间方向上的位移,即x、y、z轴上的位移。为了获得来自不同表示子空间的信息,本申请实施例提供的图像配准方法基于多头注意力机制,对于每个特征向量采用K个注意力头,并且每个注意力头的位移基向量数为N,从而,为每个体素生成K×N个位移基向量。具体地,对于移动图像M在尺度l下的特征图其可以被视作包括B*D3*W3*H3个C3维的特征向量,对于其中每个特征向量,在每个注意力头中,通过一个全连接层输出一个N×3维向量,其可以视作N个三维位移基向量,因此,通过K个注意力头,可以针对每个特征向量提取K×N个位移基向量。其中,该分支仅使用移动图像的特征图来生成位移基向量,这足以提取出最具代表性的位移基向量,来表示每个体素最可能的形变方向。
在步骤2032中,可以基于所述固定图像和所述移动图像各自在所述尺度下的特征图,针对所述特征向量,确定所述多个位移基向量各自的权重。
如图3B所示,确定多个位移基向量各自的权重可以对应于图3B中的下半部分的分支,该分支针对固定图像和移动图像的特征图进行操作。在本申请实施例中,对于所述K个注意力头中的每个注意力头,利用所述注意力头,基于所述固定图像和所述移动图像各自在所述尺度下的特征图的拼接,生成的N个位移基向量各自的权重。
可选地,在此分支中,通过对移动图像和固定图像的拼接(图中示为C)特征图进行线性投影,来学习每一个注意力头对应的N个位移基向量各自的注意力权重。具体地,在此分支中,可以通过线性层和归一化指数函数(softmax函数)来学习移动图像和固定图像的体素之间的相似性,从而确定该N个位移基向量各自的注意力权重(例如,在图3B中,上述5个位移基向量所对应的注意力权重分别为0.2、0.2、0.4、0.1和0.1)。
需要注意,上述两种线性投影都是独立地针对每对特征向量进行的,与标准CNN或Transformer块相比,产生的参数较少。此外,由于通过基于CNN的编码器,已经将体素的附近体素的潜在信息整合到与该体素相对应的特征向量中,因此,相似性度量不再局限于同一位置的体素,感受野随着特征图分辨率的降低而扩大。
因此,基于上述两个独立分支,可以确定与特征向量对应的体素的位移向量。
在步骤2033中,可以对于所述移动图像在所述尺度下的特征图中的每个 特征向量,基于所述特征向量对应的多个位移基向量及其权重,确定所述尺度对应的位移场。
基于移动图像中的特征向量确定其中每个体素最可能发生形变的方向,并基于移动图像和固定图像之间的相似性,确定最可能发生形变的每个方向上的相对位移量,进而可以基于这些方向和在这些方向上的相对位移量来确定相应体素的位移向量。
在本申请实施例中,基于所述特征向量对应的多个位移基向量及其权重,确定所述尺度对应的位移场,可以包括:对于所述K个注意力头,基于N个位移基向量与其各自的权重的相应乘积的平均值,确定所述尺度对应的位移场;其中,所述移动图像中的每个体素到所述固定图像中的相应体素的位移向量可以为所述多个位移基向量的加权和。
可选地,上述多个位移基向量的组合表示可以是多个位移基向量的加权和,其中,这些位移基向量的权重是通过对同一特征向量对应的所有位移基向量的注意力权重进行归一化后确定的。
如上所述,尺度l(l=1...L)对应的位移场中的每个位移向量(其中,k表示该位移向量在该位移场中的编号)可以通过将在所有注意力头对其注意力权重与相应位移基向量的乘积和的平均值。例如,可以表示为:

其中,为第j个注意力头中的第i个位移基向量,wi,j为该位移基向量的注意力权重,[·]表示拼接操作,fc1和fc2分别表示两个分支的映射函数。因此,通过确定位移场中的每个位移向量可以确定尺度l下的位移场ul(l=1...L)。
在本申请实施例中,如参考图3A至图3C所描述的确定每个尺度对应的位移场的过程中,通过使用非全局的多头注意力机制,在遵循了配准的局部性的同时,相比采用标准CNN或Transformer块的方案具有更少的参数。
图3C以流程框图的形式,针对移动图像和固定图像的特征图的维度,示出了基于移动图像和固定图像在尺度l下的特征图确定位移场ul的过程。如图3C所示,该流程框图中的两条过程流分别对应于上述两个独立分支。其中,对于左侧过程流,移动图像的特征图通过线性投影被转换为K个N×3维的位移基向量,因此,其特征图的维度从B*Dl*Wl*Hl*Cl改变为B*Dl*Wl*Hl*(K*N*3),而在右侧过程流,移动图像和固定图像的特征图经过拼接,拼接后的特征图的维度为B*Dl*Wl*Hl*2Cl,该拼接后的特征图经过上述线 性投影,被转换为上述K个N×3维的位移基向量的注意力权重,因此该拼接后的特征图改变为B*Dl*Wl*Hl*(K*N*1)。通过对这两个过程流各自的结果进行重塑变换(诸如转置变换等),可以获得K个注意力头中的位移场,其维度为B*Dl*Wl*Hl*K*3。因此,通过如上所述在所有注意力头中对位移场取平均,可以确定尺度l下的位移场ul,其维度为B*3*Dl*Wl*Hl
在步骤204中,可以基于所述多个尺度各自对应的位移场生成最终位移场,并基于所述最终位移场对所述移动图像进行变换处理,以使变换后的移动图像与所述固定图像配准。
在针对多个尺度下的特征图确定了对应的多个位移场后,这些不同尺度下的位移场不能直接作为最终配准结果,因为细尺度的位移场感受野有限,无法表征大的形变,而粗尺度的位移场太粗糙,无法为原始图像提供精确的体素级位移向量。因此,在本申请实施例提供的图像配准方法中,可以采用细化融合网络,对这些位移场进行融合,以结合低层次的细节特征图和高层次的语义特征图的特性,获得更准确的最终位移场。
在本申请实施例中,基于所述多个尺度各自对应的位移场生成最终位移场,可以包括:根据尺度大小,对所述多个尺度各自对应的位移场进行级联融合,以将所述多个尺度各自对应个位移场转换为所述移动图像的原始尺寸下的最终位移场。
在本申请实施例提供的图像配准方法中,可以采用如图4A所示的级联融合网络来基于多个位移场生成最终位移场,该级联融合网络重复应用融合块和卷积头,以将L个尺度下的位移场转换为原始分辨率下的最终位移场。
图4A是示出本申请实施例中的对多个尺度各自对应的位移场的级联融合的示意图。图4B是示出本申请实施例的级联融合中的每级融合的流程图。图4C是示出本申请实施例的级联融合中的第l级融合的示意图。
可选地,对于L=4的情况,可以按照尺度l从大到小的顺序进行级联融合,该级联融合包括L-1级融合,其中,将第l级融合的融合块的输出特征图记作gl,对于第l级融合,可以基于上级融合的输出gl+1和该级融合对应的位移场(表示为B*3*Dl*Wl*Hl)、以及移动图像和固定图像各自的特征图通过融合块生成该级融合的融合结果gl。如图4A所示,对于L=4的情况,在4个尺度下确定的位移场可以经过3级融合得到最终位移场B*3*D*W*H。其中,对于第L-1级融合,其输入中除了该级融合对应的位移场以及移动图像和固定图像各自的特征图外,不存在上级融合的输入,而是还包括对尺度L下的位移场的特征提取gL
在本申请实施例中,根据尺度大小,对所述多个尺度各自对应的位移场进行级联融合,可以包括如图4B所示的步骤2041-2044。
在步骤2041中,可以在所述级联融合中的每级融合中,对上级融合输出的具有特定通道数的特征图进行上采样,得到第一特征图。
可选地,如图4C中的(a)所示,可以对上级融合输出的特征图gl+1进行上采样,以得到该第一特征图,例如,对于尺度l下对应的特征图的尺寸按比例缩小到原始图像的尺寸的1/2l,可以对上级融合的输出gl+1进行双线性插值上采样,得到的第一特征图的维度为B*3*Dl*Wl*Hl。应当理解,其他可以实现相同效果的数据处理方式同样可以适用于本申请实施例的方法,该上采样方法在本申请实施例中仅用作示例而非限制。
在步骤2042中,可以对该级融合对应的位移场进行特征提取,将所述位移场转换为具有所述特定通道数的第二特征图。
可选地,可以通过预定卷积块将尺度l上的位移场ul转换为具有特定通道数C(例如,C=128,该固定通道数可以根据实际需要进行设置)的特征图hl,即第二特征图。类似地,在上述第L-1级融合中,对尺度L下的位移场的特征提取gL也可以是通过这样的方式获得的。其中,作为示例,该预定卷积块可以包含三个卷积层(例如,通道数分别为16、64和128,每个卷积层的尺度为1),并且其后可以跟随组归一化和激活函数(例如,ReLU激活函数)。需要注意的是,不同尺度的卷积块不共享权值,由尺度从大到小,它们的卷积核大小可以被设置为诸如(5,5,5)、(5,5,3)、(5,3,3)、(3,3,3)。应当理解,该预定卷积块的具体参数可以根据实际需要设置,上述具体参数设置在本申请实施例中仅用作示例而非限制。
在步骤2043中,可以将所述第一特征图与所述第二特征图相加,并与所述固定图像和所述移动图像各自在该级融合对应的尺度下各自的特征图进行拼接。
可选地,可以将上级融合的结果结合当前融合对应的位移场信息和特征图,来生成融合结果。例如,可以将上级融合的结果gl+1加到从当前融合对应的位移场中提取的特征图hl中(其相加的结果的维度仍为B*3*Dl*Wl*Hl)。此后,可以将相加的结果与潜在表示进行拼接,该拼接的结果可以表示为其中,upsample(·)表示对(·)进行上采样。因此,通过对该拼接的结果应用如图4C中的(b)所示的卷积块提取特征信息,可以得到该级融合的图像信息gl
在步骤2044中,可以对所述拼接的结果进行特征提取,生成具有所述特定通道数的特征图,作为该级融合的输出。
如图4C中的(b)所示,融合块的最后一个组成部分可以是另一个卷积块,其可以将输出的高阶潜在表示的通道数(C+2Cl)减小为固定通道数C,该卷积块可以具有与如上所述的预定卷积块具有相同的结构和参数,此处不再赘述,但应理解,其他可以实现相同目的或效果的方式同样可以适用于本申请实施例提供的图像配准方法。
在本申请实施例中,根据尺度大小,对所述多个尺度各自对应的位移场进行级联融合,还可以包括:对于所述级联融合中的第一级融合输出的具有 所述特定通道数的特征图进行特征提取,生成所述最终位移场。
如上所述,对于L=4的情况,通过L-1个融合块的级联融合,需要输出通道数为3的最终位移场。可选地,可以对第一级融合输出的特征图g1应用类似的卷积头(例如,包含三个解码块的卷积头,每个解码块有两个连续结构,包含一个卷积层和一个激活函数(例如,ReLU激活函数)),从而生成最终位移场,其维度为B*3*D*W*H。
在本申请实施例提供的图像配准方法中,通过采用如参考图4A至图4C所描述的细化融合网络,对在多个尺度下确定的中间位移场进行融合,生成最终位移场,以基于低层次的细节特征图和高层次的语义特征图的特性,结合位移场信息和多尺度下的特征图,提供高分辨率的大形变映射能力。
因此,如上参考步骤201至204所描述的图像配准方法可以表示为如图2C所示的示意性流程图。其中,通过将待配准图像对输入本申请实施例中的多尺度配准网络,可以得到待配准图像对中的移动图像的形变场,因此,通过对该移动图像施加所得到的形变场,可以输出最终的配准图像。
在本申请实施例提供的图像配准方法中,上述多尺度配准网络的训练可以基于对配准损失函数的最优化。其中,通过在该多尺度配准网络中基于训练数据针对配准损失函数的最优化,可以确定该多尺度配准网络的最优参数。所确定的最优参数可以在后续图像配准任务中直接应用,而无需每次针对实时任务重新进行最优化计算,本申请实施例的图像配准方法中诸如卷积等操作所涉及的参数,都可以通过基于对配准损失函数的最优化而预先确定。
因此,根据本公开的实施例,所述最终位移场的生成可以基于对配准损失函数的最优化,所述配准损失函数可以包括用于度量变换后的移动图像(即基于最终位移场对移动图像进行变换处理得到的)与所述固定图像之间的相似性的第一项、以及用于对位移场施加惩罚以惩罚所述移动图像的局部空间变化的第二项。
可选地,本申请实施例提供的图像配准方法可以使用大多数的配准损失函数,例如以下配准损失函数:
其中,λ为正则化权衡参数。
其中,第一项通过局部归一化互相关,来度量变换后的移动图像与固定图像之间的相似性,该项可以表示为:
其中,Ω为整个体素集,P表示体素p的近邻体素。
如果用表示固定图像和变换后的移动图像在体素p的近邻体素的值的期望,则ΔF(q)和可以分别定义为以排除绝对灰度值带来的影响,从而更好地 体现待配准图像对之间的结构差异。
在本公开的实施例中,第二项可以是对形变场施加的正则化,以惩罚移动图像中的局部空间变化,该项可以表示为:
其中,表示两个相邻体素之间的位移向量的近似空间梯度。
可选地,可以通过u(p(x+1),py,pz)-u(px,py,pz)计算得到并分别应用类似的运算得到因此,通过如上约束位移量的大小,可以保证移动图像不会产生太大的畸变。
如上所述,本申请实施例提供的图像配准方法通过在每个尺度下对位移场施加惩罚,为每个尺度下的位移场确定以及特征提取器提供直接指导。
应当注意,为了计算目标函数,需要将每个尺度下确定的位移场上采样到原始尺寸,以使移动图像形变。
因此,本申请实施例提供的图像配准方法所采用的总配准损失函数可以表示为:
其中,βl表示尺度l下的中间配准损失函数的权重。
如上所述,通过在每个尺度下的位移场确定的输出上施加辅助损失,如图2B所示,可以为充分利用每个尺度的表示子空间提供额外的指导,以避免移动图像中的过大畸变。
图5是示出本申请实施例中的不同方法下的图像配准结果、及不同尺度下的形变场的可视化结果的示意图。其中,图5中的(a)示出了采用不同方法下的医学图像配准结果对比,图5中的(b)示出了采用本申请实施例提供的图像配准方法在不同尺度下的形变场的可视化结果。
图5中(a)和(b)所示的图像配准结果为对大脑的磁共振成像(MRI)图像的配准结果,其中,不同深度的曲线表示不同大脑结构的边界(例如,MRI图像中的最下方呈对称分布的两个封闭曲线对应于尾状核(caudate)的边界)。如图5中(a)所示,与其他图像配准方法(诸如SyN、NiftyReg、VoxelMorph等方法)相比,本申请实施例中的图像配准方法能够更好地保留医学图像的细节信息,并且形变之后的移动图像(图中标记为“本公开”)能和不同大脑结构的分割标注吻合得更好。
图5中(a)和(b)中的形变场的不同颜色代表了形变的程度,也就是该体素点的位移量,其中,颜色越深表示位移量越大,而形变场中的曲线变化方向指示体素点的位移方向。如图5中(b)所示,随着尺度由小变大,所得到的形变场逐渐模糊,分辨率逐渐降低。通过本申请实施例的图像配准方法,可以结合低层次的细节特征图和高层次的语义特征图的特性,获得更准确的位移场。
图6是示出本申请实施例的图像配准方法的应用的示意图。
如图6所示,本申请实施例提供的图像配准方法可以作为模型直接应用。例如,在图6中的(a)中,用户可以上传成对的待配准的图像(例如,在前端A上),而后台在接收到待配准图像对后,可以使用基于本申请实施例提供的图像配准方法所训练好的模型,直接对该待配准图像对进行配准(后端),并将配准结果输出(例如,在前端B上)。
如图6中的(b)所示,本申请实施例提供的图像配准方法还可以通过智能开放实验室平台部署,用户可通过向显示界面中拖入相关模块(例如,图像配准模块),而直接使用该图像配准方法并输出配准后的图像。
图7是示出本申请实施例中的图像配准装置700的示意图。
所述图像配准装置700可以包括图像获取模块701、特征图提取模块702、位移场确定模块703和位移场合成模块704。
图像获取模块701可以被配置为获取待配准的固定图像和移动图像。
特征图提取模块702可以被配置为获得所述固定图像和所述移动图像各自在多个尺度下的特征图。
位移场确定模块703可以被配置为对于所述多个尺度中的每个尺度,根据所述固定图像和所述移动图像在所述尺度下各自的特征图,确定所述尺度对应的位移场,所述位移场用于指示在所述尺度下从所述移动图像到所述固定图像的映射,在所述位移场中,所述移动图像中的每个体素到所述固定图像中的相应体素的位移向量为多个位移基向量的组合,所述多个位移基向量是基于所述移动图像在所述尺度下的特征图中与所述体素对应的特征向量获得的。
位移场合成模块704可以被配置为基于在所述多个尺度下确定的多个位移场生成最终位移场,并基于所述最终位移场对所述移动图像进行变换处理,以使变换后的移动图像与所述固定图像配准。
可选的,所述位移场确定模块703具体可以被配置为:
对于所述移动图像在所述尺度下的特征图中的每个特征向量,基于所述特征向量生成所述特征向量对应的多个位移基向量;基于所述固定图像和所述移动图像各自在所述尺度下的特征图,确定所述多个位移基向量各自的权重;
对于所述移动图像在所述尺度下的特征图中的每个特征向量,基于所述特征向量对应的多个位移基向量及其权重,确定所述尺度对应的位移场。
可选的,所述多个位移基向量及其各自的权重是利用K个注意力头基于所述特征向量生成的,所述K为大于1的整数;
其中,利用每个所述注意力头生成N个位移基向量;所述N为大于或等于1的整数;
其中,对于所述K个注意力头中的每个注意力头,利用所述注意力头, 基于所述固定图像和所述移动图像各自在所述尺度下的特征图的拼接,生成所述N个位移基向量各自的权重。
可选的,所述位移场确定模块703具体可以被配置为:
对于所述K个注意力头,基于所述N个位移基向量与其各自的权重的相应乘积的平均值,确定所述尺度对应的位移场;
其中,所述移动图像中的每个体素到所述固定图像中的相应体素的位移向量为所述多个位移基向量的加权和。
可选的,所述特征图提取模块702具体可以被配置为:
利用基于卷积神经网络的编码器,通过所述编码器中的不同卷积块提取所述固定图像和所述移动图像各自在不同尺度下的特征图,获得所述固定图像和所述移动图像各自在多个尺度下的特征图;
其中,所述固定图像或所述移动图像在特定尺度下的特征图的尺寸,是按照相应比例对所述固定图像或所述移动图像进行缩小后的尺寸。
可选的,所述位移场合成模块704具体可以被配置为:
根据尺度大小,对所述多个尺度各自对应的位移场进行级联融合,以将所述多个尺度各自对应的位移场转换为所述移动图像的原始尺寸下的最终位移场。
可选的,所述位移场合成模块704具体可以被配置为:
在所述级联融合中的每级融合中,对上级融合输出的具有特定通道数的特征图进行上采样,得到第一特征图;
对该级融合对应的位移场进行特征提取,将所述位移场转换为具有所述特定通道数的第二特征图;
将所述第一特征图与所述第二特征图相加,并与所述固定图像和所述移动图像各自在该级融合对应的尺度下的特征图进行拼接;
对所述拼接的结果进行特征提取,生成具有所述特定通道数的特征图,作为该级融合的输出。
可选的,所述位移场合成模块704具体可以被配置为:
对于所述级联融合中的第一级融合输出的具有所述特定通道数的特征图进行特征提取,生成所述最终位移场。
可选的,所述最终位移场的生成基于对配准损失函数的最优化,所述配准损失函数包括用于度量所述变换后的移动图像与所述固定图像之间的相似性的第一项、以及用于对位移场施加惩罚以惩罚所述移动图像的局部空间变化的第二项。
根据本申请实施例的又一方面,还提供了一种图像配准设备,该图像配准设备为计算机设备。图8示出了本申请实施例中的图像配准设备2000的示意图。
如图8所示,所述图像配准设备2000可以包括一个或多个处理器2010, 和一个或多个存储器2020。其中,所述存储器2020中存储有计算机可读代码,所述计算机可读代码当由所述一个或多个处理器2010运行时,可以执行如上所述的图像配准方法。
本申请实施例中的处理器可以是一种集成电路芯片,具有信号的处理能力。上述处理器可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,可以是X86架构或ARM架构的。
一般而言,本申请的各种示例实施例可以在硬件或专用电路、软件、固件、逻辑,或其任何组合中实施。某些方面可以在硬件中实施,而其他方面可以在可以由控制器、微处理器或其他计算设备执行的固件或软件中实施。当本公开的实施例的各方面被图示或描述为框图、流程图或使用某些其他图形表示时,将理解此处描述的方框、装置、系统、技术或方法可以作为非限制性的示例在硬件、软件、固件、专用电路或逻辑、通用硬件或控制器或其他计算设备,或其某些组合中实施。
例如,根据本申请实施例的方法或装置也可以借助于图9所示的计算设备3000的架构来实现。如图9所示,计算设备3000可以包括总线3010、一个或多个CPU 3020、只读存储器(ROM)3030、随机存取存储器(RAM)3040、连接到网络的通信端口3050、输入/输出组件3060、硬盘3070等。计算设备3000中的存储设备,例如ROM 3030或硬盘3070可以存储本公开提供的图像配准方法的处理和/或通信使用的各种数据或文件以及CPU所执行的程序指令。计算设备3000还可以包括用户界面3080。当然,图8所示的架构只是示例性的,在实现不同的设备时,根据实际需要,可以省略图9示出的计算设备中的一个或多个组件。
根据本申请实施例的又一方面,还提供了一种计算机可读存储介质。图10示出了本申请实施例中的存储介质的示意图4000。
如图10所示,所述计算机存储介质4020上存储有计算机可读指令4010。当所述计算机可读指令4010由处理器运行时,可以执行参照以上附图描述的根据本公开的实施例的图像配准方法。本申请实施例中的计算机可读存储介质可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(ROM)、可编程只读存储器(PROM)、可擦除可编程只读存储器(EPROM)、电可擦除可编程只读存储器(EEPROM)或闪存。易失性存储器可以是随机存取存储器(RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(SDRAM)、双倍数据速率同步动态随机存取存储 器(DDRSDRAM)、增强型同步动态随机存取存储器(ESDRAM)、同步连接动态随机存取存储器(SLDRAM)和直接内存总线随机存取存储器(DR RAM)。应注意,本文描述的方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。应注意,本文描述的方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行本申请实施例的图像配准方法。
在上面详细描述的本申请的示例实施例仅仅是说明性的,而不是限制性的。本领域技术人员应该理解,在不脱离本申请的原理和精神的情况下,可对这些实施例或其特征进行各种修改和组合,这样的修改应落入本申请的范围内。

Claims (15)

  1. 一种图像配准方法,由计算机设备执行,包括:
    获取待配准的固定图像和移动图像;
    获得所述固定图像和所述移动图像各自在多个尺度下的特征图;
    对于所述多个尺度中的每个尺度,根据所述固定图像和所述移动图像各自在所述尺度下的特征图,确定所述尺度对应的位移场,所述位移场用于指示在所述尺度下从所述移动图像到所述固定图像的映射,在所述位移场中,所述移动图像中的每个体素到所述固定图像中的相应体素的位移向量为多个位移基向量的组合,所述多个位移基向量是基于所述移动图像在所述尺度下的特征图中与所述体素对应的特征向量获得的;
    基于所述多个尺度各自对应的位移场生成最终位移场,并基于所述最终位移场对所述移动图像进行变换处理,以使变换后的移动图像与所述固定图像配准。
  2. 如权利要求1所述的方法,所述根据所述固定图像和所述移动图像各自在所述尺度下的特征图,确定所述尺度对应的位移场,包括:
    对于所述移动图像在所述尺度下的特征图中的每个特征向量,基于所述特征向量生成所述特征向量对应的多个位移基向量;基于所述固定图像和所述移动图像各自在所述尺度下的特征图,确定所述多个位移基向量各自的权重;
    对于所述移动图像在所述尺度下的特征图中的每个特征向量,基于所述特征向量对应的多个位移基向量及其权重,确定所述尺度对应的位移场。
  3. 如权利要求2所述的方法,所述多个位移基向量及其各自的权重是利用K个注意力头基于所述特征向量生成的,所述K为大于1的整数;
    其中,利用每个所述注意力头生成N个位移基向量;所述N为大于或等于1的整数;
    其中,对于所述K个注意力头中的每个注意力头,利用所述注意力头,基于所述固定图像和所述移动图像各自在所述尺度下的特征图的拼接,生成所述N个位移基向量各自的权重。
  4. 如权利要求3所述的方法,所述基于所述特征向量对应的多个位移基向量及其权重,确定所述尺度对应的位移场,包括:
    对于所述K个注意力头,基于所述N个位移基向量与其各自的权重的相应乘积的平均值,确定所述尺度对应的位移场;
    其中,所述移动图像中的每个体素到所述固定图像中的相应体素的位移向量为所述多个位移基向量的加权和。
  5. 如权利要求1所述的方法,所述获得所述固定图像和所述移动图像各自在多个尺度下的特征图,包括:
    利用基于卷积神经网络的编码器,通过所述编码器中的不同卷积块提取 所述固定图像和所述移动图像各自在不同尺度下的特征图,获得所述固定图像和所述移动图像各自在多个尺度下的特征图;
    其中,所述固定图像或所述移动图像在特定尺度下的特征图的尺寸,是按照相应比例对所述固定图像或所述移动图像进行缩小后的尺寸。
  6. 如权利要求5所述的方法,所述基于所述多个尺度各自对应的位移场生成最终位移场,包括:
    根据尺度大小,对所述多个尺度各自对应的位移场进行级联融合,以将所述多个尺度各自对应的位移场转换为所述移动图像的原始尺寸下的最终位移场。
  7. 如权利要求6所述的方法,所述根据尺度大小,对所述多个尺度各自对应的位移场进行级联融合,包括:
    在所述级联融合中的每级融合中,对上级融合输出的具有特定通道数的特征图进行上采样,得到第一特征图;
    对该级融合对应的位移场进行特征提取,将所述位移场转换为具有所述特定通道数的第二特征图;
    将所述第一特征图与所述第二特征图相加,并与所述固定图像和所述移动图像各自在该级融合对应的尺度下的特征图进行拼接;
    对所述拼接的结果进行特征提取,生成具有所述特定通道数的特征图,作为该级融合的输出。
  8. 如权利要求7所述的方法,所述根据尺度大小,对所述多个尺度各自对应的位移场进行级联融合,还包括:
    对于所述级联融合中的第一级融合输出的具有所述特定通道数的特征图进行特征提取,生成所述最终位移场。
  9. 如权利要求1所述的方法,所述最终位移场的生成基于对配准损失函数的最优化,所述配准损失函数包括用于度量所述变换后的移动图像与所述固定图像之间的相似性的第一项、以及用于对位移场施加惩罚以惩罚所述移动图像的局部空间变化的第二项。
  10. 一种图像配准装置,包括:
    图像获取模块,被配置为获取待配准的固定图像和移动图像;
    特征图提取模块,被配置为获得所述固定图像和所述移动图像各自在多个尺度下的特征图;
    位移场确定模块,被配置为对于所述多个尺度中的每个尺度,根据所述固定图像和所述移动图像各自在所述尺度下的特征图,确定所述尺度对应的位移场,所述位移场用于指示在所述尺度下从所述移动图像到所述固定图像的映射;在所述位移场中,所述移动图像中的每个体素到所述固定图像中的相应体素的位移向量为多个位移基向量的组合,所述多个位移基向量是基于所述移动图像在所述尺度下的特征图中与所述体素对应的特征向量获得的;
    位移场合成模块,被配置为基于所述多个尺度各自对应的位移场生成最终位移场,并基于所述最终位移场对所述移动图像进行变换处理,以使变换后的移动图像与所述固定图像配准。
  11. 如权利要求10所述的装置,所述根据所述固定图像和所述移动图像各自在所述尺度下的特征图,确定所述尺度对应的位移场,包括:
    对于所述移动图像在所述尺度下的特征图中的每个特征向量,基于所述特征向量生成所述特征向量对应的多个位移基向量;基于所述固定图像和所述移动图像各自在所述尺度下的特征图,确定所述多个位移基向量各自的权重;
    对于所述移动图像在所述尺度下的特征图中的每个特征向量,基于所述特征向量对应的多个位移基向量及其权重,确定所述尺度对应的位移场。
  12. 如权利要求10所述的装置,所述基于所述多个尺度各自对应的位移场生成最终位移场,包括:
    根据尺度大小,对所述多个尺度各自对应的位移场进行级联融合,以将所述多个尺度各自对应的位移场转换为所述移动图像的原始尺寸下的最终位移场;
    其中,所述根据尺度大小,对所述多个尺度各自对应的位移场进行级联融合,包括:
    在所述级联融合中的每级融合中,对上级融合输出的具有特定通道数的特征图进行上采样,得到第一特征图;
    对该级融合对应的位移场进行特征提取,将所述位移场转换为具有所述特定通道数的第二特征图;
    将所述第一特征图与所述第二特征图相加,并与所述固定图像和所述移动图像各自在该级融合对应的尺度下的特征图进行拼接;
    对所述拼接的结果进行特征提取,生成具有所述特定通道数的特征图,作为该级融合的输出。
  13. 一种计算机设备,包括:
    一个或多个处理器;以及
    一个或多个存储器,其中存储有计算机可执行程序,当由所述处理器执行所述计算机可执行程序时,执行权利要求1-9中任一项所述的方法。
  14. 一种计算机程序产品,所述计算机程序产品存储在计算机可读存储介质上,并且包括计算机指令,所述计算机指令在由处理器运行时使得计算机设备执行权利要求1-9中任一项所述的方法。
  15. 一种计算机可读存储介质,其上存储有计算机可执行指令,所述指令在被处理器执行时用于实现如权利要求1-9中任一项所述的方法。
PCT/CN2023/076415 2022-04-29 2023-02-16 图像配准方法、装置、设备和存储介质 WO2023207266A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210478974.5 2022-04-29
CN202210478974.5A CN115115676A (zh) 2022-04-29 2022-04-29 图像配准方法、装置、设备和存储介质

Publications (1)

Publication Number Publication Date
WO2023207266A1 true WO2023207266A1 (zh) 2023-11-02

Family

ID=83326809

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/076415 WO2023207266A1 (zh) 2022-04-29 2023-02-16 图像配准方法、装置、设备和存储介质

Country Status (2)

Country Link
CN (1) CN115115676A (zh)
WO (1) WO2023207266A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911432A (zh) * 2023-12-18 2024-04-19 北京邮电大学 图像分割方法、装置及存储介质
CN118115729A (zh) * 2024-04-26 2024-05-31 齐鲁工业大学(山东省科学院) 多层次多尺度特征交互的图像伪造区域识别方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115676A (zh) * 2022-04-29 2022-09-27 腾讯医疗健康(深圳)有限公司 图像配准方法、装置、设备和存储介质
CN116664635B (zh) * 2023-07-31 2023-10-24 柏意慧心(杭州)网络科技有限公司 构建目标对象的多维动态模型的方法、计算设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160335777A1 (en) * 2015-05-13 2016-11-17 Anja Borsdorf Method for 2D/3D Registration, Computational Apparatus, and Computer Program
CN113052882A (zh) * 2021-03-26 2021-06-29 上海商汤智能科技有限公司 图像配准方法及相关装置、电子设备、存储介质
CN113112534A (zh) * 2021-04-20 2021-07-13 安徽大学 一种基于迭代式自监督的三维生物医学图像配准方法
CN113850852A (zh) * 2021-09-16 2021-12-28 北京航空航天大学 一种基于多尺度上下文的内窥镜图像配准方法及设备
CN115115676A (zh) * 2022-04-29 2022-09-27 腾讯医疗健康(深圳)有限公司 图像配准方法、装置、设备和存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111243066B (zh) * 2020-01-09 2022-03-22 浙江大学 一种基于自监督学习与生成对抗机制的人脸表情迁移方法
CN111950467B (zh) * 2020-08-14 2021-06-25 清华大学 基于注意力机制的融合网络车道线检测方法及终端设备
CN112150425B (zh) * 2020-09-16 2024-05-24 北京工业大学 一种基于神经网络的无监督血管内超声图像配准方法
CN113516693B (zh) * 2021-05-21 2023-01-03 郑健青 一种快速通用的图像配准方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160335777A1 (en) * 2015-05-13 2016-11-17 Anja Borsdorf Method for 2D/3D Registration, Computational Apparatus, and Computer Program
CN113052882A (zh) * 2021-03-26 2021-06-29 上海商汤智能科技有限公司 图像配准方法及相关装置、电子设备、存储介质
CN113112534A (zh) * 2021-04-20 2021-07-13 安徽大学 一种基于迭代式自监督的三维生物医学图像配准方法
CN113850852A (zh) * 2021-09-16 2021-12-28 北京航空航天大学 一种基于多尺度上下文的内窥镜图像配准方法及设备
CN115115676A (zh) * 2022-04-29 2022-09-27 腾讯医疗健康(深圳)有限公司 图像配准方法、装置、设备和存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911432A (zh) * 2023-12-18 2024-04-19 北京邮电大学 图像分割方法、装置及存储介质
CN118115729A (zh) * 2024-04-26 2024-05-31 齐鲁工业大学(山东省科学院) 多层次多尺度特征交互的图像伪造区域识别方法及系统

Also Published As

Publication number Publication date
CN115115676A (zh) 2022-09-27

Similar Documents

Publication Publication Date Title
WO2023207266A1 (zh) 图像配准方法、装置、设备和存储介质
Zhao et al. Dermoscopy image classification based on StyleGAN and DenseNet201
Sanzari et al. Bayesian image based 3d pose estimation
Sigal et al. Combined discriminative and generative articulated pose and non-rigid shape estimation
CN111160214B (zh) 一种基于数据融合的3d目标检测方法
WO2020133636A1 (zh) 前列腺手术中外包膜智能检测和预警方法及系统
KR20210021039A (ko) 이미지 처리 방법, 장치, 전자 기기 및 컴퓨터 판독 가능한 저장 매체
JP2023545199A (ja) モデル訓練方法、人体姿勢検出方法、装置、デバイスおよび記憶媒体
CN112784782B (zh) 一种基于多视角双注意网络的三维物体识别方法
CN113516693B (zh) 一种快速通用的图像配准方法
Wang et al. Automatic vertebrae localization and identification by combining deep SSAE contextual features and structured regression forest
Zhou et al. 3D shape classification and retrieval based on polar view
CN111091010A (zh) 相似度确定、网络训练、查找方法及装置和存储介质
CN114419732A (zh) 基于注意力机制优化的HRNet人体姿态识别方法
CN115496720A (zh) 基于ViT机制模型的胃肠癌病理图像分割方法及相关设备
Wang et al. Multi-view attention-convolution pooling network for 3D point cloud classification
Wang et al. 3D human pose and shape estimation with dense correspondence from a single depth image
Ma et al. Self-supervised method for 3D human pose estimation with consistent shape and viewpoint factorization
CN112329662B (zh) 基于无监督学习的多视角显著性估计方法
CN114240809A (zh) 图像处理方法、装置、计算机设备及存储介质
Li et al. Subpixel image registration algorithm based on pyramid phase correlation and upsampling
Saygili Predicting medical image registration error with block-matching using three orthogonal planes approach
US20230040793A1 (en) Performance of Complex Optimization Tasks with Improved Efficiency Via Neural Meta-Optimization of Experts
Bastiaansen et al. Towards segmentation and spatial alignment of the human embryonic brain using deep learning for atlas-based registration
Li et al. Semantic segmentation of remote sensing image based on bilateral branch network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23794702

Country of ref document: EP

Kind code of ref document: A1