WO2024022060A1 - 一种图像配准方法、装置及存储介质 - Google Patents
一种图像配准方法、装置及存储介质 Download PDFInfo
- Publication number
- WO2024022060A1 WO2024022060A1 PCT/CN2023/105843 CN2023105843W WO2024022060A1 WO 2024022060 A1 WO2024022060 A1 WO 2024022060A1 CN 2023105843 W CN2023105843 W CN 2023105843W WO 2024022060 A1 WO2024022060 A1 WO 2024022060A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature map
- image
- feature
- attention
- module
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000012545 processing Methods 0.000 claims abstract description 67
- 238000000605 extraction Methods 0.000 claims abstract description 36
- 238000004364 calculation method Methods 0.000 claims description 39
- 238000000926 separation method Methods 0.000 claims description 33
- 230000008569 process Effects 0.000 claims description 15
- 238000011176 pooling Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 13
- 230000002452 interceptive effect Effects 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000003993 interaction Effects 0.000 claims description 10
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 210000000621 bronchi Anatomy 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 229920001690 polydopamine Polymers 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 206010034719 Personality change Diseases 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
- G06T7/337—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the invention belongs to the field of image processing technology, and in particular relates to an image registration method, device and storage medium.
- SIFT Scale-invariant feature transform
- the purpose of the present invention is to provide an image registration method, device and storage medium, aiming to solve the problem of low accuracy of existing inter-frame registration methods.
- the present invention provides an image registration method, which method includes:
- a feature extraction network is used to extract features from the first image and the second image respectively to obtain multiple first feature maps of the first image at different scales and multiple second features of the second image at different scales. picture;
- the extracted first feature map and second feature map are sequentially input into the attention module for processing, and the first feature map and second feature map after attention processing are obtained;
- an image registration device which includes:
- a feature extraction unit is used to extract features from the first image and the second image respectively using a feature extraction network to obtain a plurality of first feature maps of the first image at different scales and a plurality of first feature maps of the second image at different scales. multiple second feature maps;
- the attention processing unit is used to sequentially input the extracted first feature map and the second feature map into the attention module for processing, and obtain the first feature map and the second feature map after attention processing;
- a feature point pair acquisition unit is used to determine the similarity between the pixels contained in the first feature map and the second feature map that correspond to the same scale and have been processed by attention, and based on the determined similarity between the pixels , obtain the matching feature point pair between the first image and the second image.
- the present invention also provides an electronic device, including: a memory and a processor;
- the memory stores executable program code
- the processor coupled to the memory calls the executable program code stored in the memory to execute the image registration method provided in the above embodiment.
- the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored.
- the computer program When the computer program is run by a processor, the computer program implements the image registration method provided in the above embodiments.
- the present invention uses a feature extraction network to extract features from the first image and the second image respectively, and obtains multiple first feature maps of the first image at different scales and multiple second features of the second image at different scales.
- Figure input the extracted first feature map and second feature map to the attention module in turn for processing, obtain the first feature map and the second feature map after attention processing, and determine that they correspond to the same scale and have been processed by attention.
- the similarity between the pixels contained in the first feature map and the second feature map is calculated, and based on the determined similarity between the pixels, the matching feature point pair between the first image and the second image can be obtained.
- Significantly increases the number of matching feature point pairs obtained, thereby improving the accuracy of image registration.
- Figure 1 is a schematic diagram of matching feature point pairs of bronchial images obtained using the SIFT algorithm
- Figure 2 is a flow chart of the implementation of an image registration method provided by an embodiment of the present application.
- Figure 3A is a flow chart of the implementation of the image registration method provided by an embodiment of the present application.
- Figure 3B is a schematic diagram of matching feature point pairs of bronchial images obtained using the image registration method described in the embodiment of the present application;
- Figure 3C is a schematic diagram of matching feature point pairs of bronchial images obtained using the image registration method described in the embodiment of the present application;
- Figure 3D is a schematic diagram of the attention module in the image registration method provided by the embodiment of the present application.
- Figure 4 is a schematic structural diagram of an image registration device provided by an embodiment of the present application.
- Figure 5 is a schematic structural diagram of an image registration device provided by an embodiment of the present application.
- FIG. 6 is a schematic diagram of the hardware structure of an electronic device according to an embodiment of the present application.
- SIFT Scale-invariant feature transform
- the image acquisition device is a bronchoscope.
- SIFT Scale-invariant feature transform
- the SIFT algorithm to perform inter-frame registration on the front and rear frame images collected by a bronchoscope can only obtain 3 pairs of matching points as shown in the connecting lines in Figure 1.
- the changes in the position and posture of the scope reduce the accuracy of bronchoscope navigation.
- an embodiment of the present invention provides an implementation process of an image registration method.
- the parts related to the embodiment of the present invention are shown. The details are as follows:
- step S201 a feature extraction network is used to extract features from the first image and the second image respectively to obtain multiple first feature maps of the first image at different scales and multiple second feature maps of the second image at different scales. feature map;
- Embodiments of the present invention are applicable to electronic devices, which may be mobile phones, tablet computers, wearable devices, notebook computers, ultra-mobile personal computers (UMPC), netbooks, personal digital assistants (personal digital assistants, PDA) and other equipment, the embodiments of this application do not place any restrictions on the specific types of electronic devices.
- electronic devices which may be mobile phones, tablet computers, wearable devices, notebook computers, ultra-mobile personal computers (UMPC), netbooks, personal digital assistants (personal digital assistants, PDA) and other equipment, the embodiments of this application do not place any restrictions on the specific types of electronic devices.
- the first image and the second image may be images collected by an image acquisition device, where the image acquisition device may be a device with an image acquisition function, such as a bronchoscope, a camera, a video camera, etc.
- the image acquisition device may be a device with an image acquisition function, such as a bronchoscope, a camera, a video camera, etc.
- the instructions do not limit this.
- a feature extraction network is used to extract features of the first image and the second image respectively, to obtain multiple first feature maps of the first image at different scales and multiple first feature maps of the second image at different scales.
- the first image and the second image are two frame images that require matching point pair extraction or registration. Further, the first image and the second image are adjacent frame images captured by the bronchoscope, so as to estimate the position and posture of the bronchoscope.
- the first image and the second image can also be images collected by the bronchoscope with a preset time interval. The preset time can be set according to actual needs, and this specification does not limit this.
- a feature extraction network can be used to extract features of the first image at multiple scales, such as shallow and deep levels.
- the first feature map belonging to the shallow scale has undergone fewer convolutions, and the feeling The field is smaller, so the first feature map belonging to the shallow scale usually contains more texture features of the first image; the first feature map belonging to the deep scale has gone through more convolutions and has a larger receptive field, so The first feature map belonging to the deep scale usually contains more semantic information of the first image, where the semantic information may include shape features.
- the process of using the feature extraction network to extract multi-scale features such as shallow and deep layers from the second image is similar to the aforementioned feature extraction process for the first image, and will not be described again here.
- step S202 the extracted first feature map and second feature map are sequentially input into the attention module for processing, and the first feature map and second feature map after attention processing are obtained;
- the extracted first feature map and the second feature map are sequentially input into the attention module for processing, and the attention is obtained.
- the processed first feature map and second feature map are sequentially input into the attention module for processing, and the attention is obtained.
- step S203 the similarity between the pixels contained in the first feature map and the second feature map that correspond to the same scale and have been processed by attention is determined, and based on the determined similarity between the pixels, the third feature map is obtained. Matching feature point pairs between one image and the second image.
- the embodiment of the present invention uses a feature extraction network to extract features from the first image and the second image respectively, and obtains multiple first feature maps of the first image at different scales and multiple first feature maps of the second image at different scales.
- Two feature maps input the extracted first feature map and second feature map to the attention module in turn for processing, obtain the first feature map and the second feature map after attention processing, and determine that they correspond to the same scale and have been paid attention to.
- an embodiment of the present invention provides an implementation process of an image registration method.
- the parts related to the embodiment of the present invention are shown. The details are as follows:
- a feature extraction network is used to extract features from the first image and the second image respectively to obtain multiple first feature maps of the first image at different scales and multiple second feature maps of the second image at different scales.
- a feature extraction network is used to extract features of the first image and the second image respectively, to obtain multiple first feature maps of the first image at different scales and multiple first feature maps of the second image at different scales.
- the second feature map is used.
- the first image and the second image are two frame images that require matching point pair extraction.
- the first image and the second image are adjacent frame images captured by the bronchoscope, so as to estimate the position and posture of the bronchoscope.
- the first image and the second image can also be images collected by the bronchoscope with a preset time interval. The preset time can be set according to actual needs, and this specification does not limit this.
- the feature extraction network includes a twin neural network to extract features of the first image and the second image respectively.
- the twin neural network includes a first sub-network and a second sub-network.
- the first sub-network and the second sub-network adopt the same or similar convolutional network structure.
- the first image and the second image are extracted through the twin neural network.
- multi-scale features perform multi-scale feature extraction on the first image through the first sub-network to obtain multiple first feature maps of the first image at different scales, and perform multi-scale feature extraction on the second image through the second sub-network. Extract and obtain multiple second feature maps of the second image at different scales.
- the plurality of first feature maps include output feature maps of the last N convolution blocks of the first subnetwork
- the plurality of second feature maps include output feature maps of the last N convolution blocks of the second subnetwork.
- the twin neural network includes a first sub-network and a second sub-network, and the first sub-network and the second sub-network are constructed based on the ResNet network.
- the ResNet18 neural network architecture can be used, or the ResNet18 neural network architecture can be used.
- the extracted first feature maps include the third convolution block and the fourth convolution block of the first sub-network. and the output feature map of the fifth convolution block.
- the extracted second feature maps include the output feature maps of the third convolution block, the fourth convolution block and the fifth convolution block of the second sub-network, so as to pass
- the ResNet network implements the extraction of multi-scale features for different frames of the first image and the second image, which can improve the extraction accuracy of subsequent matching feature point pairs.
- the five convolution blocks included in the ResNet network can be represented as conv1_x, conv2_x, conv3_x, conv4_x, conv5_x respectively.
- the three convolution blocks conv3_x, conv4_x, and conv5_x of the first sub-network extract the first image.
- a feature map can be represented by conv3_xA, conv4_xA, and conv5_xA respectively.
- the second feature map of the second image extracted by the three convolution blocks conv3_x, conv4_x, and conv5_x of the second sub-network can be represented by conv3_xB, conv4_xB, and conv5_xB respectively.
- the dimension adjustment module can be used to adjust the dimensions of the obtained first feature map and the second feature map corresponding to different scales, so that the obtained first feature map and the second feature map are consistent in the spatial dimension and the channel dimension. Consistent, it facilitates the subsequent splicing or feature interaction operation of the first feature map or the second feature map corresponding to the same scale, and also facilitates the subsequent processing of the attention module.
- the channel dimensions of the first feature map and the second feature map can be unified through one-dimensional convolution, and the spatial dimensions of the first feature map and the second feature map can be unified through deconvolution.
- step S302 any one of the first feature map and the second feature map is input into the channel attention module to perform the channel attention operation, and the result of the channel attention operation is input into the spatial attention module to perform Spatial attention operation is used to obtain the corresponding any feature map after attention processing;
- the attention module includes a channel attention module and a spatial attention module, and the channel attention module is deployed before the spatial attention module.
- the channel attention module can perform the channel attention operation on the first feature map and obtain The result of the channel attention operation corresponding to the first feature map is input to the spatial attention module.
- the spatial attention module can perform the spatial attention operation on the result of the aforementioned channel attention operation, so that the corresponding attention-processed result can be obtained.
- the first feature map after The result of the channel attention operation of the second feature map is input to the spatial attention module.
- the spatial attention module can perform a spatial attention operation on the result of the aforementioned channel attention operation, so that the corresponding attention-processed third result can be obtained.
- the channel attention module may include a first global pooling layer, a first one-dimensional convolution layer and a first coefficient calculation layer.
- any feature map in the first feature map and the second feature map is When inputting the channel attention module to perform the channel attention operation, it includes: calculating the maximum value of any feature map in the spatial dimension of the first feature map and the second feature map through the first global pooling layer, which can reduce the The dimension of the feature map reduces overfitting and obtains the corresponding third feature map.
- One-dimensional convolution calculation is performed on the channel dimension of the third feature map through the first one-dimensional convolution layer, and the first coefficient calculation layer is used to The feature map calculated by one-dimensional convolution is normalized to obtain the channel attention coefficient.
- the obtained channel attention coefficient is used to process any input feature map to obtain the result of the channel attention operation, thereby improving The accuracy of channel attention calculation, for example, multiply the channel attention coefficient with any of the first feature map and the second feature map of the input channel attention module to obtain any one after the channel attention operation.
- Feature map More specifically, assuming that the input feature map X dimension of the channel attention module is H*W*C, where H represents the height of the feature map, W represents the width of the feature map, and C represents the number of channels, by performing maximum K pooling ( top-k pooling) operation, calculate the largest K values of And perform one-dimensional convolution calculation on the dimension K, and finally normalize it through the Softmax function to obtain the channel attention coefficient. Multiply the channel attention coefficient and the original feature map X to obtain a new feature map.
- the new feature map The feature map of is the output of the channel attention module, which is also the input of the spatial attention module.
- the spatial attention module may include a second global pooling layer, a second one-dimensional volume layer connected in sequence.
- the product layer and the second coefficient calculation layer when inputting the result of the channel attention operation into the spatial attention module to perform the spatial attention operation, include: calculating the input feature map of the spatial attention module in the channel through the second global pooling layer The maximum value of the dimension is obtained, and the corresponding fourth feature map is obtained.
- One-dimensional convolution calculation is performed on the spatial dimension of the fourth feature map through the second one-dimensional convolution layer, and the one-dimensional convolution calculation is performed through the second coefficient calculation layer.
- the final features are normalized to obtain the spatial attention coefficient.
- the spatial attention coefficient is used to process the input feature map of the spatial attention module to obtain the corresponding first feature map or second feature map after attention processing. Thereby improving the accuracy of spatial attention calculation, for example, multiplying the spatial attention coefficient and the input feature map of the spatial attention module to obtain the corresponding first feature map or second feature map after attention processing. More specifically, assuming that the input feature map of the spatial attention module is represented by The dimension of Z' is H*W. One-dimensional convolution calculation is performed on the two dimensions H and W of Z' through two one-dimensional convolutions. Finally, the Softmax function is used to perform one-dimensional convolution calculation on the features. Normalize to obtain the spatial attention coefficient. Multiply the spatial attention coefficient and the original feature X' to obtain a new feature output, that is, the first feature map or the second feature map after attention processing. As an example, the structure of the attention module can be referred to Figure 3D.
- the attention module includes a channel attention module and a spatial attention module
- the spatial attention module is deployed before the channel attention module
- the first extracted The feature map and the second feature map are sequentially input into the attention module for processing.
- any feature map among the first feature map and the second feature map is input.
- the spatial attention module is used to perform the spatial attention operation and input the result of the spatial attention operation into the channel attention module to perform the channel attention operation and obtain the corresponding first feature map or second feature after attention processing. picture.
- the structures of the channel attention module and the spatial attention module are similar to the previous embodiments, and will not be described again here.
- a more accurate first feature map or second feature map after attention processing can be obtained.
- the first feature map or second feature map after attention processing can be obtained.
- the feature map pays more attention to the key information in the image, which is conducive to the subsequent extraction of more matching feature point pairs and the extraction of more accurate matching feature point pairs.
- step S303 the similarity between the pixels contained in the first feature map and the second feature map that correspond to the same scale and have been processed by attention is determined, and based on the determined similarity between the pixels, the third feature map is obtained. Matching feature point pairs between one image and the second image.
- any target feature map pair is obtained, where each target The feature map pair includes a first feature map and a second feature map that belong to the same scale and have been processed by attention; perform feature interaction operations on the target feature map pair to obtain an interactive feature map corresponding to any target feature map pair;
- the interactive feature map is input to the pre-trained first convolutional network to obtain a separation result output by the first convolutional network.
- the separation result includes a first separation feature map corresponding to the first feature map and a second separation feature map corresponding to the second feature map.
- the separation feature map, and the similarity between the pixels contained in the first separation feature map and the second separation feature map can be effectively improved.
- the interactive feature map corresponding to any target feature map pair is a four-dimensional tensor
- the first convolutional network is a four-dimensional convolutional network.
- a four-dimensional convolution calculation is performed on each interactive feature map, and the four-dimensional convolution calculation is performed on each interactive feature map.
- feature map for feature separation As an example, taking the above-mentioned interactive feature maps conv3AB, conv4AB, and conv5AB as an example, four-dimensional convolution calculations are performed on conv3AB, conv4AB, and conv5AB respectively to capture the common features between the first image and the second image. After four-dimensional The dimension of the feature map output after the convolution calculation is 2*H*W*C. The feature map output after the four-dimensional convolution calculation is then separated.
- conv3AB is separated into conv3A and conv3B
- conv4AB is separated into conv4A and conv4B
- conv4AB is separated into conv4A and conv4B.
- the dimensions of the feature maps obtained are all H*W*C.
- deconvolution and linear interpolation can be used to obtain the H*W of each feature map in the feature map after separation. Correspond to the length and width of the initial first image and second image, and then determine the similarity between pixels at the same position in the corresponding feature map after separation.
- the corresponding similarities contained in the first separation feature map and the second separation feature map are The pixels whose degree is not less than the set threshold are determined as matching feature point pairs, so that the corresponding pixel points with greater similarity can be screened out as matching feature point pairs, without the need to use all pixels as matching feature point pairs. It can save computer computing power resources.
- FIG. 3B Taking the bronchus image described in the background art as an example, a large number of matching feature point pairs can be extracted using the method described in this embodiment.
- the obtained matching feature point pairs are shown in Figure 3B, and Figure 3C shows another image processing
- the matching feature points obtained from the instance It can be seen from Figure 1, Figure 3B and Figure 3C that the matching feature point pairs extracted using the method described in this embodiment are much richer than the matching feature point pairs extracted using the SIFT algorithm. Then the accuracy of the determined position and posture of the bronchoscope can be improved by using the matching feature point pairs as shown in Figure 3B, which is beneficial to subsequent navigation operations for the bronchoscope.
- the changes in the pose data of the bronchoscope during the process of collecting the previous frame of images to the acquisition of the next frame of images can be determined, which is conducive to accurate prediction.
- the current position of the bronchoscope is conducive to accurate prediction.
- the attention-processed pixels belonging to the same scale when determining the similarity between the pixels contained in the first feature map and the second feature map corresponding to the same scale, select the attention-processed pixels belonging to the same scale.
- the first feature map and the second feature map are directly input into the pre-trained second convolutional network to obtain the selected first feature output by the second convolutional network.
- the similarity between the pixels contained in the image and the selected second feature map can be improved to obtain the corresponding
- the simplicity of the similarity between the pixels contained in the first feature map and the second feature map of the scale improves processing efficiency.
- the feature extraction unit 41 is used to extract features from the first image and the second image respectively using a feature extraction network to obtain multiple first feature maps of the first image at different scales and multiple first feature maps of the second image at different scales.
- second feature map ;
- the attention processing unit 42 is configured to sequentially input the extracted first feature map and second feature map into the attention module for processing, and obtain the first feature map and the second feature map after attention processing;
- the feature point pair acquisition unit 43 is used to determine the similarity between the pixels contained in the first feature map and the second feature map that correspond to the same scale and have been processed by attention, and based on the determined similarity between the pixels degree to obtain the matching feature point pair between the first image and the second image.
- the embodiment of the present invention uses a feature extraction network to extract features from the first image and the second image respectively to obtain multiple first feature maps of the first image at different scales and multiple second features of the second image at different scales.
- input the extracted first feature map and second feature map to the attention module in turn for processing, obtain the first feature map and the second feature map after attention processing, and determine that they correspond to the same scale and have been processed by attention.
- the similarity between the pixels contained in the first feature map and the second feature map is calculated, and based on the determined similarity between the pixels, a matching feature point pair between the first image and the second image is obtained, so as to Improve the accuracy of image registration.
- the feature extraction unit 51 is used to extract features from the first image and the second image respectively using a feature extraction network to obtain multiple first feature maps of the first image at different scales and multiple first feature maps of the second image at different scales.
- second feature map ;
- the attention processing unit 52 is configured to sequentially input the extracted first feature map and second feature map into the attention module for processing, and obtain the first feature map and the second feature map after attention processing;
- the feature point pair acquisition unit 53 is used to determine the similarity between the pixels contained in the first feature map and the second feature map that correspond to the same scale and have been processed by attention, and based on the determined similarity between the pixels degree to obtain the matching feature point pair between the first image and the second image.
- the feature extraction network includes a twin neural network.
- the twin neural network includes a first sub-network and a second sub-network with the same structure and shared weights.
- the feature extraction unit 51 includes:
- the first image obtaining unit 511 is configured to use the first sub-network to process the first image and obtain multiple first feature maps of the first image at different scales;
- the second image obtaining unit 512 is configured to use a second sub-network to process the second image and obtain a plurality of second feature maps of the second image at different scales.
- first image and the second image are adjacent frame images captured by the bronchoscope.
- the attention module includes a channel attention module and a spatial attention module.
- the attention processing unit 52 includes:
- the first processing unit 521 is used to input any one of the first feature map and the second feature map into
- the channel attention module is to perform the channel attention operation, and input the result of the channel attention operation into the spatial attention module to perform the spatial attention operation and obtain any corresponding feature map after attention processing.
- the channel attention module includes a first global pooling layer, a first one-dimensional convolution layer and a first coefficient calculation layer
- the first processing unit 521 includes:
- the first maximum value calculation unit is used to calculate the maximum value of any feature map in the spatial dimension of the first feature map and the second feature map through the first global pooling layer, and obtain the corresponding third feature map;
- the first convolution calculation unit is used to perform one-dimensional convolution calculation on the channel dimension of the third feature map through the first one-dimensional convolution layer;
- the first normalization unit is used to normalize the feature map after one-dimensional convolution calculation through the first coefficient calculation layer to obtain the channel attention coefficient;
- the first multiplication unit is used to use the obtained channel attention coefficient to process any input feature map.
- the spatial attention module includes a second global pooling layer, a second one-dimensional convolution layer and a second coefficient calculation layer connected in sequence.
- the first processing unit 521 also includes:
- the second maximum value calculation unit is used to calculate the maximum value of the input feature map of the spatial attention module in the channel dimension through the second global pooling layer to obtain the corresponding fourth feature map;
- the second convolution calculation unit is used to perform one-dimensional convolution calculation on the spatial dimension of the fourth feature map through the second one-dimensional convolution layer;
- the second normalization unit is used to normalize the features after one-dimensional convolution calculation through the second coefficient calculation layer to obtain the spatial attention coefficient
- the second multiplication unit is used to use the spatial attention coefficient to process the input feature map of the spatial attention module.
- the attention processing unit 52 includes:
- the second processing unit is used to input any feature map of the first feature map and the second feature map into the spatial attention module to perform the spatial attention operation, and input the result of the spatial attention operation into the channel attention module, To perform the channel attention operation, obtain the corresponding any feature map after attention processing.
- the image registration device also includes:
- the dimension adjustment unit is used to use the dimension adjustment module to adjust the dimensions of the obtained first feature map and the second feature map corresponding to different scales, so that the first feature map and the second feature map input to the attention module are in space.
- the dimensions are consistent with the channel dimensions.
- the feature point pair acquisition unit 53 includes:
- a feature map pair acquisition unit is used to acquire any target feature map pair, wherein each target feature map pair includes a first feature map and a second feature map that belong to the same scale and have been processed by attention;
- the feature interaction unit is used to perform feature interaction operations on the target feature map pair and obtain the interactive feature map corresponding to any target feature map pair;
- the feature map separation unit is used to input the interactive feature map into the pre-trained first convolution network and obtain the separation result output by the first convolution network.
- the separation result includes the first separation feature map corresponding to the first feature map, the corresponding The second separation feature map of the second feature map, and the similarity between the pixels contained in the first separation feature map and the second separation feature map.
- the feature point pair acquisition unit 53 also includes:
- the feature point pair determination unit is used to determine the corresponding pixel points contained in the first separation feature map and the second separation feature map whose similarity is not less than a set threshold as matching feature point pairs.
- the feature point pair acquisition unit 53 includes:
- a feature map selection unit used to select the first feature map and the second feature map that belong to the same scale and have been processed by attention;
- the similarity acquisition unit is used to input the selected first feature map and the second feature map into the pre-trained second convolution network, and obtain the selected first feature map output by the second convolution network and the selected The similarity between the pixels contained in the second feature map.
- each unit or module of the image registration device can be implemented by a corresponding hardware or software unit or module.
- Each unit or module can be an independent software or hardware unit or module, or can be integrated into one software and hardware unit or module. Units or modules are not used to limit the present invention.
- FIG. 6 a schematic diagram of the hardware structure of an electronic device according to an embodiment of the present application is shown.
- the electronic device may be any of various types of computer system devices that are non-removable or removable or portable and perform wireless or wired communications.
- the electronic device can be a desktop computer, a server, a mobile phone or a smart phone (for example, based on iPhone TM, Android TM based phone), a portable gaming device (for example, Nintendo DS TM, PlayStation Portable TM, Gameboy Advance TM, iPhone TM), laptop computers, PDAs, portable Internet devices, portable medical devices, smart cameras, music players and data storage devices, other handheld devices and such as watches, earphones, pendants, earphones, etc.
- the electronic device can also be other Wearable devices (eg, such as electronic glasses, electronic clothes, electronic bracelets, electronic necklaces, and other head-mounted devices (HMDs)).
- HMDs head-mounted devices
- the electronic device 6 may include control circuitry, which may include storage and processing circuitry 61 .
- the storage and processing circuitry 61 may include memory such as hard drive memory, non-volatile memory (such as flash memory or other electronically programmable limited erasure memory used to form solid state drives, etc.), volatile memory (such as static or dynamic memory) Random access memory, etc.), etc., are not limited by the embodiments of this application.
- the processing circuitry in the storage and processing circuitry 61 may be used to control the operation of the electronic device 6 .
- the processing circuit can be implemented based on one or more microprocessors, microcontrollers, digital signal processors, baseband processors, power management units, audio codec chips, application specific integrated circuits, display driver integrated circuits, etc.
- the storage and processing circuit 61 may be used to run software in the electronic device 6, such as Internet browsing applications, Voice over Internet Protocol (VOIP) phone calling applications, email applications, media playback applications, operating system functions wait.
- These software can be used to perform some control operations, such as camera-based image acquisition, ambient light sensor-based Ambient light measurement, proximity sensor based proximity sensor measurement, information display functionality based on status indicators such as status indicators such as LEDs, touch sensor based touch event detection, and on multiple (e.g. layered) displays Functions associated with displaying information, operations associated with performing wireless communication functions, operations associated with collecting and generating audio signals, control operations associated with collecting and processing button press event data, and other functions in the electronic device 6 etc., are not limited by the embodiments of this application.
- the memory stores executable program code
- a processor coupled to the memory calls the executable program code stored in the memory to execute the image registration method as described in the foregoing embodiments, for example: Figure 2 The method described in steps S201-S203.
- the executable program code includes various units or modules of the image registration device as described in the previous embodiments, such as modules 41-43 in Figure 4.
- modules 41-43 in Figure 4.
- the specific processes by which the above units or modules implement their respective functions will not be described again here.
- embodiments of the present application also provide a non-transitory computer-readable storage medium.
- the non-transitory computer-readable storage medium can be configured in the server in the above embodiments.
- the non-transitory computer-readable storage medium A computer program is stored on the medium, and when the program is executed by the processor, the image registration method described in the foregoing image registration method embodiment is implemented.
- modules/units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered to be beyond the scope of the present invention.
- the disclosed device/terminal and method can be implemented in other ways.
- the device/terminal embodiments described above are only illustrative.
- the division of modules or units is only a logical function division.
- there may be other division methods for example, multiple units or components may be The combination can either be integrated into another system, or some features can be ignored, or not implemented.
- the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.
- a unit described as a separate component may or may not be physically separate.
- a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.
- the above integrated units can be implemented in the form of hardware or software functional units.
- the integrated unit is implemented as a software functional unit and sold or used as an independent product can be stored in a computer-readable storage medium.
- the present invention can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware through a computer program.
- the computer program can be stored in a computer-readable storage medium, and when executed by a processor, the computer program can implement the steps of each of the above method embodiments.
- the computer program includes computer program code, and the computer program code can be in the form of source code, object code, executable file or some intermediate form, etc.
- Computer-readable media may include: any entity or device capable of carrying computer program code, recording media, USB flash drives, mobile hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM, Read-Only Memory), random access Memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media, etc. It should be noted that the content contained in the computer-readable medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, the computer-readable medium does not include They are electrical carrier signals and telecommunications signals.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
一种图像配准方法、装置及存储介质,适用于图像处理技术领域,该方法包括:采用特征提取网络分别对第一图像和第二图像进行特征提取,得到所述第一图像在不同尺度下的多个第一特征图以及所述第二图像在不同尺度下的多个第二特征图(S201);将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图(S202);确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度,并基于确定的像素点之间的相似度,获取所述第一图像与所述第二图像之间的匹配特征点对(S203),从而提高了图像配准的准确率。
Description
本发明属于图像处理技术领域,尤其涉及一种图像配准方法、装置及存储介质。
目前的图像帧间配准,通常采用尺度不变特征变换(Scale-invariant feature transform,SIFT)等传统算法进行帧间配准。但采用SIFT等传统的算法,通常只能找到几个少量的匹配点。然而,利用少量的匹配点往往难以估计相应的图像采集设备的位姿变化情况,无法准确获得图像采集设备的位置和姿态变化。
发明内容
本发明的目的在于提供一种图像配准方法、装置及存储介质,旨在解决现有帧间配准方法准确率低的问题。
一方面,本发明提供一种图像配准方法,所述方法包括:
采用特征提取网络分别对第一图像和第二图像进行特征提取,得到所述第一图像在不同尺度下的多个第一特征图以及所述第二图像在不同尺度下的多个第二特征图;
将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图;
确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度,并基于确定的像素点之间的相似度,获取所述第一图像与所述第二图像之间的匹配特征点对。
另一方面,本发明提供了一种图像配准装置,所述装置包括:
特征提取单元,用于采用特征提取网络分别对第一图像和第二图像进行特征提取,得到所述第一图像在不同尺度下的多个第一特征图以及所述第二图像在不同尺度下的多个第二特征图;
注意力处理单元,用于将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图;以及
特征点对获取单元,用于确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度,并基于确定的像素点之间的相似度,获取所述第一图像与所述第二图像之间的匹配特征点对。
另一方面,本发明还提供了一种电子装置,包括:存储器和处理器;
所述存储器存储有可执行程序代码;
与所述存储器耦合的所述处理器,调用所述存储器中存储的所述可执行程序代码,执行如上述实施例提供的图像配准方法。
另一方面,本发明还提供了一种非暂时性计算机可读存储介质,其上存储有计算机程序,所述计算机程序在被处理器运行时,实现如上述实施例提供的图像配准方法。
本发明采用特征提取网络分别对第一图像和第二图像进行特征提取,得到第一图像在不同尺度下的多个第一特征图以及所述第二图像在不同尺度下的多个第二特征图,将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图,确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度,并基于确定的像素点之间的相似度,获取第一图像与第二图像之间的匹配特征点对,可以显著提升获取到的匹配特征点对的数量,从而提高了图像配准的准确率。
图1为采用SIFT算法获取到的支气管图像的匹配特征点对的示意图;
图2为本申请一实施例提供的图像配准方法的实现流程图;
图3A为本申请一实施例提供的图像配准方法的实现流程图;
图3B为采用本申请实施例描述的图像配准方法获取到的支气管图像的匹配特征点对的示意图;
图3C为采用本申请实施例描述的图像配准方法获取到的支气管图像的匹配特征点对的示意图;
图3D为本申请实施例提供的图像配准方法中的注意力模块的示意图;
图4为本申请一实施例提供的图像配准装置的结构示意图;
图5为本申请一实施例提供的图像配准装置的结构示意图;
图6为本申请一实施例提供的电子装置的硬件结构示意图。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
以下结合具体实施例对本发明的具体实现进行详细描述:
目前的图像帧间配准,通常采用尺度不变特征变换(Scale-invariant feature transform,SIFT)等传统算法进行帧间配准计算,其中,如图1所示,以图像采集设备为支气管镜为例,采用SIFT算法对支气管镜所采集到的前后帧图像进行帧间配准,仅仅能够获得如图1中的连线所示的3对匹配点,而利用该3对匹配点往往难以估计支气管镜的位姿变化情况,降低了支气管镜导航的准确性。
参见图2,本发明一实施例提供了图像配准方法的实现流程,为了便于说明,仅示出了与本发明实施例相关的部分,详述如下:
在步骤S201中,采用特征提取网络分别对第一图像和第二图像进行特征提取,得到第一图像在不同尺度下的多个第一特征图以及第二图像在不同尺度下的多个第二特征图;
本发明实施例适用于电子装置,该电子装置可以为手机、平板电脑、可穿戴设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等设备,本申请实施例对电子装置的具体类型不作任何限制。
在本发明实施例中,第一图像和第二图像可以为由图像采集设备采集到的图像,其中,图像采集设备可以是具有图像采集功能的设备,例如,支气管镜、相机、摄像机等,本说明书并不对此进行限制。
在本发明实施例中,采用特征提取网络分别对第一图像和第二图像进行特征提取,得到第一图像在不同尺度下的多个第一特征图以及第二图像在不同尺度下的多个第二特征图。其中,第一图像和第二图像是需要进行匹配点对提取或配准的两帧图像。进一步地,第一图像和第二图像为由支气管镜所拍摄到的相邻帧图像,以对支气管镜的位姿进行估计。当然,第一图像和第二图像还可以为由支气管镜所采集到的间隔预设时长的图像,预设时长可以按照实际需求进行设置,本说明书并不对此进行限制。
在本发明实施例中,可以采用特征提取网络对第一图像进行浅层和深层等多个尺度的特征提取,其中,属于浅层尺度的第一特征图经过的卷积次数较少,并且感受野较小,因而,属于浅层尺度下的第一特征图通常包含第一图像的纹理特征较多;属于深层尺度的第一特征图经过的卷积次数较多,并且感受野较大,因而属于深层尺度的第一特征图通常包含第一图像的语义信息较多,其中,语义信息可以包括形状特征。而采用特征提取网络对第二图像进行浅层和深层等多尺度的特征提取的过程,与前述针对第一图像进行特征提取的过程类似,此处不再赘述。
在步骤S202中,将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图;
在本发明实施例中,为了更加关注有利于图像配准的关键信息,同时去除噪声的干扰,将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图。
在步骤S203中,确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度,并基于确定的像素点之间的相似度,获取第一图像与第二图像之间的匹配特征点对。
本发明实施例采用特征提取网络分别对第一图像和第二图像进行特征提取,得到第一图像在不同尺度下的多个第一特征图以及所述第二图像在不同尺度下的多个第二特征图,将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图,确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度,并基于确定的像素点之间的相似度,获取第一图像与第二图像之间的
匹配特征点对,从而通过多尺度的特征图以及特征图的注意力处理,提高了图像配准的准确率。
参见图3A,本发明一实施例提供了图像配准方法的实现流程,为了便于说明,仅示出了与本发明实施例相关的部分,详述如下:
在步骤S301中,采用特征提取网络分别对第一图像和第二图像进行特征提取,得到第一图像在不同尺度下的多个第一特征图以及第二图像在不同尺度下的多个第二特征图;
在本发明实施例中,采用特征提取网络分别对第一图像和第二图像进行特征提取,得到第一图像在不同尺度下的多个第一特征图以及第二图像在不同尺度下的多个第二特征图。其中,第一图像和第二图像是需要进行匹配点对提取的两帧图像。进一步地,第一图像和第二图像为由支气管镜所拍摄到的相邻帧图像,以对支气管镜的位姿进行估计。当然,第一图像和第二图像还可以为由支气管镜所采集到的间隔预设时长的图像,预设时长可以按照实际需求进行设置,本说明书并不对此进行限制。
在本申请一具体实施例中,特征提取网络包括孪生神经网络,以分别对第一图像和第二图像进行特征提取。更进一步地,孪生神经网络包括第一子网络和第二子网络,第一子网络和第二子网络采用相同或相似的卷积网络结构,在通过孪生神经网络提取第一图像和第二图像的多尺度特征时,通过第一子网络对第一图像进行多尺度特征提取,得到第一图像在不同尺度下的多个第一特征图,通过第二子网络对第二图像进行多尺度特征提取,得到第二图像在不同尺度下的多个第二特征图。更具体地,多个第一特征图包括第一子网络的最后N个卷积块的输出特征图,多个第二特征图包括第二子网络的最后N个卷积块的输出特征图。
在本申请一优选实施例中,孪生神经网络包括第一子网络和第二子网络,且第一子网络和第二子网络基于ResNet网络构建,其中,可以采用ResNet18神经网络架构,也可以采用ResNet34或者ResNet101等层数更多的神经网络架构。而假设第一子网络和第二子网络中的ResNet网络均包含五个卷积块,则提取得到的多个第一特征图包括第一子网络的第三卷积块、第四卷积块和第五卷积块的输出特征图,提取得到的多个第二特征图包括第二子网络的第三卷积块、第四卷积块和第五卷积块的输出特征图,从而通过ResNet网络实现针对第一图像和第二图像的不同帧图像的多尺度特征的提取,可以提高后续匹配特征点对的提取准确度。作为示例地,ResNet网络包含的五个卷积块可分别表示为conv1_x、conv2_x、conv3_x,conv4_x,conv5_x,第一子网络的3个卷积块conv3_x,conv4_x,conv5_x提取到的第一图像的第一特征图可分别用conv3_xA、conv4_xA、conv5_xA表示,第二子网络的3个卷积块conv3_x、conv4_x、conv5_x提取到的第二图像的第二特征图可分别用conv3_xB、conv4_xB、conv5_xB表示。
可选地,于本申请其他一实施方式中,在通过步骤S301得到第一图像在不同尺度下的多个第一特征图以及第二图像在不同尺度下的多个第二特征图之
后,可以采用维度调整模块对获取到的对应于不同尺度的第一特征图和第二特征图进行维度调整,以使得得到的第一特征图和第二特征图在空间维度和通道维度上相一致,便于后续将对应于同一尺度的第一特征图或者第二特征图进行拼接或者进行特征交互操作,也便于后续注意力模块的处理。具体地,可通过一维卷积统一第一特征图和第二特征图的通道维度,通过反卷积统一第一特征图和第二特征图的空间维度。
在步骤S302中,将第一特征图和第二特征图中的任一特征图输入通道注意力模块,以执行通道注意力操作,并将通道注意力操作的结果输入空间注意力模块,以执行空间注意力操作,获得相应的经过注意力处理后的任一特征图;
在本发明实施例中,注意力模块包括通道注意力模块和空间注意力模块,且通道注意力模块被部署于空间注意力模块之前,此时,在将提取到的第一特征图和第二特征图分别输入注意力模块进行处理的过程中:可以将提取到的第一特征图输入通道注意力模块,那么可以由通道注意力模块执行针对第一特征图的通道注意力操作,并将获得的对应于第一特征图的通道注意力操作的结果输入空间注意力模块,可以由空间注意力模块执行针对前述通道注意力操作的结果的空间注意力操作,从而可以获得相应的经过注意力处理后的第一特征图;同样的,可以将提取到的第二特征图输入通道注意力模块,那么可以由通道注意力模块执行针对第二特征图的通道注意力操作,并将获得的对应于第二特征图的通道注意力操作的结果输入空间注意力模块,可以由空间注意力模块执行针对前述通道注意力操作的结果的空间注意力操作,从而可以获得相应的经过注意力处理后的第二特征图。
具体地,通道注意力模块可包括第一全局池化层、第一一维卷积层和第一系数计算层,此时,在将第一特征图和第二特征图中的任一特征图输入通道注意力模块以执行通道注意力操作时,包括:通过第一全局池化层计算第一特征图和第二特征图中的任一特征图在空间维度的最大值,可以降低该任一特征图的维度,减少过拟合,得到对应的第三特征图,通过第一一维卷积层在第三特征图的通道维度上进行一维卷积计算,通过第一系数计算层对经过一维卷积计算后的特征图进行归一化处理,得到通道注意力系数,最后利用得到的通道注意力系数,对输入的任一特征图进行处理,获得通道注意力操作的结果,从而提高通道注意力计算的准确性,例如,将通道注意力系数与输入通道注意力模块的第一特征图和第二特征图中的任一特征图相乘,获得经过通道注意力操作后的任一特征图。更具体地,假设通道注意力模块的输入特征图X维度为H*W*C,其中,H表示特征图的高,W表示特征图的宽,C表示通道数,通过执行最大K池化(top-k pooling)操作,计算X在H*W维度的最大的K个值,得到输出Z,Z的维度为C*K,通过一维卷积分别在维度C上做一维卷积计算,以及在维度K上做一维卷积计算,最后通过Softmax函数做归一化,得到通道注意力系数,把通道注意力系数和原始特征图X相乘,即可得到新的特征图,该新的特征图即为通道注意力模块的输出,也即空间注意力模块的输入。
具体地,空间注意力模块可包括依次连接的第二全局池化层、第二一维卷
积层和第二系数计算层,在将通道注意力操作的结果输入空间注意力模块以执行空间注意力操作时,包括:通过第二全局池化层计算空间注意力模块的输入特征图在通道维度的最大值,得到对应的第四特征图,通过第二一维卷积层在第四特征图的空间维度上进行一维卷积计算,通过第二系数计算层对经过一维卷积计算后的特征进行归一化,得到空间注意力系数,利用空间注意力系数对空间注意力模块的输入特征图进行处理,得到相应的经过注意力处理后的第一特征图或者第二特征图,从而提高空间注意力计算的准确性,例如,将空间注意力系数与空间注意力模块的输入特征图相乘后得到相应的经过注意力处理后的第一特征图或者第二特征图。更具体地,假设空间注意力模块的输入特征图用X’表示,X’的维度为H*W*C,首先通过全局池化操作计算X’在C维度的最大值,得到输出Z’,Z’的维度为H*W,通过两个一维卷积分别在Z’的两个维度H和W上做一维卷积计算,最后通过Softmax函数对经过一维卷积计算后的特征进行归一化,得到空间注意力系数,将空间注意力系数和原始特征X’相乘,即可得到新的特征输出,即经过注意力处理后的第一特征图或者第二特征图。作为示例地,注意力模块的结构可参考图3D。
可选地,于本申请其他一实施方式中,若注意力模块包括通道注意力模块和空间注意力模块、且空间注意力模块被部署于通道注意力模块之前,则在将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图时,将第一特征图和第二特征图中的任一特征图输入空间注意力模块,以执行空间注意力操作,并将空间注意力操作的结果输入通道注意力模块,以执行通道注意力操作,获得相应的经过注意力处理后的第一特征图或者第二特征图。而此时通道注意力模块和空间注意力模块的结构,与前述实施例类似,此处不再赘述。
而将通道注意力模块部署于空间注意力模块之前,可以获得更加精确的经过注意力处理后的第一特征图或者第二特征图,得到的经过注意力处理后的第一特征图或者第二特征图更加关注于图像中的关键信息,有利于后续提取更多的匹配特征点对,以及提取到更加精确的匹配特征点对。
在步骤S303中,确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度,并基于确定的像素点之间的相似度,获取第一图像与第二图像之间的匹配特征点对。
在本申请实施例中,具体地,在确定对应于同一尺度的第一特征图和第二特征图所含像素点之间的相似度时,获取任一目标特征图对,其中,每一目标特征图对包括属于同一尺度且经过注意力处理后的第一特征图和第二特征图;执行针对目标特征图对的特征交互操作,获得对应于任一目标特征图对的交互特征图;将交互特征图输入预先训练的第一卷积网络,获得由第一卷积网络输出的分离结果,分离结果包括对应于第一特征图的第一分离特征图、对应于第二特征图的第二分离特征图,以及第一分离特征图与第二分离特征图所含像素点之间的相似度。通过执行上述的特征交互操作,可以有效提高确定出的对应于同一尺度的第一特征图和第二特征图所含像素点之间的相似度的准确度。
在本发明实施例中,对应于任一目标特征图对的交互特征图为一四维张量,第一卷积网络为四维卷积网络。作为示例地,以上述的ResNet网络为例,假设经过上述注意力计算后的特征图用conv3_xA’、conv4_xA’、conv5_xA’和conv3_xB’,conv4_xB’、conv5_xB’表示,对conv3_xA’和conv3_xB’进行特征交互,对conv4_xA’和conv4_xB’进行特征交互,对conv5_xA’和conv5_xB’进行特征交互,则经过特征交互后得到的交互特征图可用conv3AB、conv4AB、conv5AB表示,其中,conv3AB=conv3_xA’Tconv3_xB’,conv4AB=conv4_xA’Tconv4_xB’,conv5AB=conv5_xA’Tconv5_xB’,conv3AB、conv4AB和conv5AB的维度均为H*W*H*W,即均为四维张量。
在将交互特征图输入预先训练的第一卷积网络,获得由第一卷积网络输出的分离结果时,优选地,对每个交互特征图进行四维卷积计算,对经过四维卷积计算后的特征图进行特征分离。作为示例地,以上述的交互特征图conv3AB、conv4AB、conv5AB为例,对conv3AB、conv4AB、conv5AB分别进行四维卷积计算,以捕捉图像第一图像和第二图像之间特征的共同点,经过四维卷积计算后输出的特征图的维度为2*H*W*C,之后对该经过四维卷积计算后输出的特征图做分离,将conv3AB分离为conv3A和conv3B,conv4AB分离为conv4A和conv4B,conv4AB分离为conv4A和conv4B,进行特征分离后得到特征图的维度均为H*W*C,之后可通过反卷积和线性插值,使得分离后得到特征图中的每个特征图的H*W和初始的第一图像、第二图像的长和宽对应,进而确定分离后对应特征图中相同位置像素点之间的相似度。
进一步地,在基于确定的像素点之间的相似度,获取第一图像与第二图像之间的匹配特征点对时,将第一分离特征图与第二分离特征图所含的对应的相似度不小于设定阈值的像素点,确定为匹配特征点对,从而可以筛选出对应的相似度较大的像素点作为匹配特征点对,而无需将所有的像素点均作为匹配特征点对,可以节省计算机的算力资源。
以背景技术中描述的支气管图像为例,采用本实施例描述的方法可以提取到大量的匹配特征点对,得到的匹配特征点对如图3B所示,图3C示出了在另外的图像处理的实例中得到的匹配特征点。由图1、图3B以及图3C可以看出,采用本实施例描述的方法提取到的匹配特征点对,比采用SIFT算法提取到的匹配特征点对要丰富的多。那么可以通过利用如图3B所示的匹配特征点对,提升确定出的支气管镜的位姿的准确性,有利于后续针对支气管镜的导航操作。其中,可以基于匹配特征点对在不同帧图像中位姿数据的变化信息,确定出支气管镜在采集前一帧图像至采集后一帧图像的过程中的位姿数据变化情况,有利于准确预测出支气管镜的当前位姿。
可选地,于本申请其他一实施方式中,在确定对应于同一尺度的第一特征图和第二特征图所含像素点之间的相似度时,选取属于同一尺度的经过注意力处理后的第一特征图和第二特征图,将被选取的第一特征图和第二特征图直接输入预先训练的第二卷积网络,获得由第二卷积网络输出的被选取的第一特征图与被选取的第二特征图所含像素点之间的相似度,可以提升获得对应于同一
尺度的第一特征图和第二特征图所含像素点之间的相似度的简便性,提升处理效率。
参见图4,本发明一实施例提供的图像配准装置的结构,为了便于说明,仅示出了与本发明实施例相关的部分,其中包括:
特征提取单元41,用于采用特征提取网络分别对第一图像和第二图像进行特征提取,得到第一图像在不同尺度下的多个第一特征图以及第二图像在不同尺度下的多个第二特征图;
注意力处理单元42,用于将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图;以及
特征点对获取单元43,用于确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度,并基于确定的像素点之间的相似度,获取第一图像与第二图像之间的匹配特征点对。
本发明实施例采用特征提取网络分别对第一图像和第二图像进行特征提取,得到第一图像在不同尺度下的多个第一特征图以及第二图像在不同尺度下的多个第二特征图,将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图,确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度,并基于确定的像素点之间的相似度,获取第一图像与第二图像之间的匹配特征点对,从而提高图像配准的准确率。
参见图5,本发明一实施例提供的图像配准装置的结构,为了便于说明,仅示出了与本发明实施例相关的部分,其中包括:
特征提取单元51,用于采用特征提取网络分别对第一图像和第二图像进行特征提取,得到第一图像在不同尺度下的多个第一特征图以及第二图像在不同尺度下的多个第二特征图;
注意力处理单元52,用于将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图;以及
特征点对获取单元53,用于确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度,并基于确定的像素点之间的相似度,获取第一图像与第二图像之间的匹配特征点对。
具体地,特征提取网络包括孪生神经网络,孪生神经网络包括结构相同且共享权重的第一子网络和第二子网络,特征提取单元51包括:
第一图像获得单元511,用于利用第一子网络对所述第一图像进行处理,获得第一图像在不同尺度下的多个第一特征图;以及
第二图像获得单元512,用于利用第二子网络对所述第二图像进行处理,获得第二图像在不同尺度下的多个第二特征图。
进一步地,第一图像和第二图像为由支气管镜所拍摄到的相邻帧图像。
具体地,注意力模块包括通道注意力模块和空间注意力模块,当通道注意力模块被部署于空间注意力模块之前时,注意力处理单元52包括:
第一处理单元521,用于将第一特征图和第二特征图中的任一特征图输入
通道注意力模块,以执行通道注意力操作,并将通道注意力操作的结果输入空间注意力模块,以执行空间注意力操作,获得相应的经过注意力处理后的任一特征图。
进一步地,通道注意力模块包括第一全局池化层、第一一维卷积层和第一系数计算层,第一处理单元521包括:
第一最大值计算单元,用于通过第一全局池化层计算第一特征图和第二特征图中的任一特征图在空间维度的最大值,得到对应的第三特征图;
第一卷积计算单元,用于通过第一一维卷积层在第三特征图的通道维度上进行一维卷积计算;
第一归一化单元,用于通过第一系数计算层对经过一维卷积计算后的特征图进行归一化处理,得到通道注意力系数;以及
第一相乘单元,用于利用得到的通道注意力系数,对输入的任一特征图进行处理。
进一步地,空间注意力模块包括依次连接的第二全局池化层、第二一维卷积层和第二系数计算层,第一处理单元521还包括:
第二最大值计算单元,用于通过第二全局池化层计算空间注意力模块的输入特征图在通道维度的最大值,得到对应的第四特征图;
第二卷积计算单元,用于通过第二一维卷积层在第四特征图的空间维度上进行一维卷积计算;
第二归一化单元,用于通过第二系数计算层对经过一维卷积计算后的特征进行归一化,得到空间注意力系数;以及
第二相乘单元,用于利用空间注意力系数,对空间注意力模块的输入特征图进行处理。
可选地,于本申请其他一实施方式中,若注意力模块包括通道注意力模块和空间注意力模块、且空间注意力模块被部署于通道注意力模块之前,则注意力处理单元52包括:
第二处理单元,用于将第一特征图和第二特征图中的任一特征图输入空间注意力模块,以执行空间注意力操作,并将空间注意力操作的结果输入通道注意力模块,以执行通道注意力操作,获得相应的经过注意力处理后的任一特征图。
进一步地,图像配准装置还包括:
维度调整单元,用于采用维度调整模块对获取到的对应于不同尺度的第一特征图和第二特征图进行维度调整,以使得输入注意力模块的第一特征图和第二特征图在空间维度和通道维度上相一致。
在本申请一实施例中,具体地,特征点对获取单元53包括:
特征图对获取单元,用于获取任一目标特征图对,其中,每一目标特征图对包括属于同一尺度且经过注意力处理后的第一特征图和第二特征图;
特征交互单元,用于执行针对目标特征图对的特征交互操作,获得对应于任一目标特征图对的交互特征图;
特征图分离单元,用于将交互特征图输入预先训练的第一卷积网络,获得由第一卷积网络输出的分离结果,分离结果包括对应于第一特征图的第一分离特征图、对应于第二特征图的第二分离特征图,以及第一分离特征图与第二分离特征图所含像素点之间的相似度。
进一步地,特征点对获取单元53还包括:
特征点对确定单元,用于将第一分离特征图与第二分离特征图所含的对应的相似度不小于设定阈值的像素点,确定为匹配特征点对。
在本申请另一实施例中,具体地,特征点对获取单元53包括:
特征图选择单元,用于选取属于同一尺度且经过注意力处理后的第一特征图和第二特征图;以及
相似度获取单元,用于将被选取的第一特征图和第二特征图输入预先训练的第二卷积网络,获得由第二卷积网络输出的被选取的第一特征图与被选取的第二特征图所含像素点之间的相似度。
在本发明实施例中,图像配准装置的各单元或模块可由相应的硬件或软件单元或模块实现,各单元或模块可以为独立的软、硬件单元或模块,也可以集成为一个软、硬件单元或模块,在此不用以限制本发明。图像配准装置的各单元或模块的具体实施方式可参考前述方法实施例的描述,在此不再赘述。
参见图6,本申请一实施例提供的电子装置的硬件结构示意图。
示例性的,电子装置可以为非可移动的或可移动或便携式并执行无线或有线通信的各种类型的计算机系统设备中的任何一种。具体的,该电子装置可以为台式电脑、服务器、移动电话或智能电话(例如,基于iPhone TM,基于Android TM的电话),便携式游戏设备(例如Nintendo DS TM,PlayStation Portable TM,Gameboy Advance TM,iPhone TM)、膝上型电脑、PDA、便携式互联网设备、便携式医疗设备、智能相机、音乐播放器以及数据存储设备,其他手持设备以及诸如手表、耳机、吊坠、耳机等,电子装置还可以为其他的可穿戴设备(例如,诸如电子眼镜、电子衣服、电子手镯、电子项链以及其他头戴式设备(HMD))。
如图6所示,电子装置6可以包括控制电路,该控制电路可以包括存储和处理电路61。该存储和处理电路61可以包括存储器,例如硬盘驱动存储器,非易失性存储器(例如闪存或用于形成固态驱动器的其它电子可编程限制删除的存储器等),易失性存储器(例如静态或动态随机存取存储器等)等,本申请实施例不作限制。存储和处理电路61中的处理电路可以用于控制电子装置6的运转。该处理电路可以基于一个或多个微处理器,微控制器,数字信号处理器,基带处理器,功率管理单元,音频编解码器芯片,专用集成电路,显示驱动器集成电路等来实现。
存储和处理电路61可用于运行电子装置6中的软件,例如互联网浏览应用程序,互联网协议语音(Voice over Internet Protocol,VOIP)电话呼叫应用程序,电子邮件应用程序,媒体播放应用程序,操作系统功能等。这些软件可以用于执行一些控制操作,例如,基于照相机的图像采集,基于环境光传感器的
环境光测量,基于接近传感器的接近传感器测量,基于诸如发光二极管的状态指示灯等状态指示器实现的信息显示功能,基于触摸传感器的触摸事件检测,与在多个(例如分层的)显示器上显示信息相关联的功能,与执行无线通信功能相关联的操作,与收集和产生音频信号相关联的操作,与收集和处理按钮按压事件数据相关联的控制操作,以及电子装置6中的其它功能等,本申请实施例不作限制。
进一步地,该存储器存储有可执行程序代码,与该存储器耦合的处理器,调用该存储器中存储的该可执行程序代码,执行如前述各实施例中描述的图像配准方法,例如:图2中的步骤S201-S203描述的方法。
其中,该可执行程序代码包括如前述各实施例中描述的图像配准装置的各个单元或模块,例如:图4中的模块41-43。上述单元或模块实现各自功能的具体过程上述图像配准装置实施例的相关描述,此处不再赘述。
进一步地,本申请实施例还提供了一种非暂时性计算机可读存储介质,该非暂时性计算机可读存储介质可以配置于上述各实施例中的服务器中,该非暂时性计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现前述图像配准方法实施例中描述的图像配准方法。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域技术人员可以意识到,结合本文中所公开的实施例描述的各示例的模块/单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
在本发明所提供的实施例中,应该理解到,所揭露的装置/终端和方法,可以通过其它的方式实现。例如,以上所描述的装置/终端实施例仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用
时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成。该计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,计算机程序包括计算机程序代码,计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机可读介质可以包括:能够携带计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括是电载波信号和电信信号。
以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围,均应包含在本发明的保护范围之内。
Claims (14)
- 一种图像配准方法,其特征在于,所述方法包括:采用特征提取网络分别对第一图像和第二图像进行特征提取,得到所述第一图像在不同尺度下的多个第一特征图以及所述第二图像在不同尺度下的多个第二特征图;将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图;确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度,并基于确定的像素点之间的相似度,获取所述第一图像与所述第二图像之间的匹配特征点对。
- 如权利要求1所述的方法,其特征在于,所述特征提取网络包括孪生神经网络;所述孪生神经网络包括结构相同且共享权重的第一子网络和第二子网络;所述采用特征提取网络分别对第一图像和第二图像进行特征提取的步骤,包括:利用所述第一子网络对所述第一图像进行处理,获得所述第一图像在不同尺度下的多个第一特征图;利用所述第二子网络对所述第二图像进行处理,获得所述第二图像在不同尺度下的多个第二特征图。
- 如权利要求1所述的方法,其特征在于,所述注意力模块包括通道注意力模块和空间注意力模块,当所述通道注意力模块被部署于所述空间注意力模块之前时,所述将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图的步骤,包括:将所述第一特征图和所述第二特征图中的任一特征图输入所述通道注意力模块,以执行通道注意力操作,并将所述通道注意力操作的结果输入所述空间注意力模块,以执行空间注意力操作,获得相应的经过注意力处理后的所述任一特征图。
- 如权利要求3所述的方法,其特征在于,所述通道注意力模块包括第一全局池化层、第一一维卷积层和第一系数计算层;将所述第一特征图和所述第二特征图中的任一特征图输入所述通道注意力模块,以执行通道注意力操作的步骤,包括;通过所述第一全局池化层计算所述第一特征图和所述第二特征图中的任一特征图在空间维度的最大值,得到对应的第三特征图;通过所述第一一维卷积层在所述第三特征图的通道维度上进行一维卷积计算;通过所述第一系数计算层对经过一维卷积计算后的特征图进行归一化处理,得到通道注意力系数;利用得到的所述通道注意力系数,对输入的所述任一特征图进行处理。
- 如权利要求3所述的方法,其特征在于,所述空间注意力模块包括依次连接的第二全局池化层、第二一维卷积层和第二系数计算层;将所述通道注意力操作的结果输入所述空间注意力模块,以执行空间注意力操作的步骤,包括:通过所述第二全局池化层计算所述空间注意力模块的输入特征图在通道维度的最大值,得到对应的第四特征图;通过所述第二一维卷积层在所述第四特征图的空间维度上进行一维卷积计算;通过所述第二系数计算层对经过一维卷积计算后的特征进行归一化,得到空间注意力系数;利用所述空间注意力系数,对所述空间注意力模块的输入特征图进行处理。
- 如权利要求1所述的方法,其特征在于,所述注意力模块包括通道注意力模块和空间注意力模块,当所述空间注意力模块被部署于所述通道注意力模块之前时,所述将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图的步骤,包括:将所述第一特征图和所述第二特征图中的任一特征图输入所述空间注意力模块,以执行空间注意力操作,并将所述空间注意力操作的结果输入所述通道注意力模块,以执行通道注意力操作,获得相应的经过注意力处理后的所述任一特征图。
- 如权利要求1所述的方法,其特征在于,所述方法还包括:采用维度调整模块对获取到的对应于不同尺度的第一特征图和第二特征图进行维度调整,以使得输入所述注意力模块的第一特征图和第二特征图在空间维度和通道维度上相一致。
- 如权利要求1所述的方法,其特征在于,确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度的步骤,包括:获取任一目标特征图对,其中,每一目标特征图对包括属于同一尺度且经过注意力处理后的第一特征图和第二特征图;执行针对所述目标特征图对的特征交互操作,获得对应于所述任一目标特征图对的交互特征图;将交互特征图输入预先训练的第一卷积网络,获得由所述第一卷积网络输出的分离结果,所述分离结果包括对应于第一特征图的第一分离特征图、对应于第二特征图的第二分离特征图,以及所述第一分离特征图与所述第二分离特征图所含像素点之间的相似度。
- 如权利要求8所述的方法,其特征在于,基于确定的像素点之间的相似度,获取所述第一图像与所述第二图像之间的匹配特征点对的步骤,包括:将所述第一分离特征图与所述第二分离特征图所含的对应的相似度不小于设定阈值的像素点,确定为所述匹配特征点对。
- 如权利要求1所述的方法,其特征在于,确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度的步骤,包括:选取属于同一尺度且经过注意力处理后的第一特征图和第二特征图;将被选取的第一特征图和第二特征图输入预先训练的第二卷积网络,获得由所述第二卷积网络输出的被选取的第一特征图与被选取的第二特征图所含像素点之间的相似度。
- 如权利要求1所述的方法,其特征在于,所述第一图像和所述第二图像为由支气管镜所拍摄到的相邻帧图像。
- 一种图像配准装置,其特征在于,所述装置包括:特征提取单元,用于采用特征提取网络分别对第一图像和第二图像进行特征提取,得到所述第一图像在不同尺度下的多个第一特征图以及所述第二图像在不同尺度下的多个第二特征图;注意力处理单元,用于将提取到的第一特征图和第二特征图依次输入注意力模块进行处理,获得经过注意力处理后的第一特征图和第二特征图;以及特征点对获取单元,用于确定对应于同一尺度且经过注意力处理后的第一特征图和第二特征图所含像素点之间的相似度,并基于确定的像素点之间的相似度,获取所述第一图像与所述第二图像之间的匹配特征点对。
- 一种电子装置,包括存储器和处理器;所述存储器存储有可执行程序代码;与所述存储器耦合的所述处理器,调用所述存储器中存储的所述可执行程序代码,执行如权利要求1至11任一项所述的方法。
- 一种非暂时性计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时,实现如权利要求1至11中的任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210900039.3 | 2022-07-28 | ||
CN202210900039.3A CN117523226A (zh) | 2022-07-28 | 2022-07-28 | 一种图像配准方法、装置及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024022060A1 true WO2024022060A1 (zh) | 2024-02-01 |
Family
ID=89705315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/105843 WO2024022060A1 (zh) | 2022-07-28 | 2023-07-05 | 一种图像配准方法、装置及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117523226A (zh) |
WO (1) | WO2024022060A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118365851A (zh) * | 2024-06-19 | 2024-07-19 | 中南大学 | 基于深度学习的图像对齐方法、系统、设备及存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110675423A (zh) * | 2019-08-29 | 2020-01-10 | 电子科技大学 | 一种基于孪生神经网络和注意力模型的无人机跟踪方法 |
CN111091576A (zh) * | 2020-03-19 | 2020-05-01 | 腾讯科技(深圳)有限公司 | 图像分割方法、装置、设备及存储介质 |
CN112560695A (zh) * | 2020-12-17 | 2021-03-26 | 中国海洋大学 | 水下目标跟踪方法、系统、存储介质、设备、终端及应用 |
CN112749602A (zh) * | 2019-10-31 | 2021-05-04 | 北京市商汤科技开发有限公司 | 目标查询方法、装置、设备及存储介质 |
WO2021115159A1 (zh) * | 2019-12-09 | 2021-06-17 | 中兴通讯股份有限公司 | 文字识别网络模型训练方法、文字识别方法、装置、终端及其计算机存储介质 |
CN113177916A (zh) * | 2021-04-21 | 2021-07-27 | 清华大学深圳国际研究生院 | 一种基于少样本学习方法的轻微高血压眼底辨别模型 |
CN114529963A (zh) * | 2020-11-23 | 2022-05-24 | 中兴通讯股份有限公司 | 图像处理方法、装置、电子设备和可读存储介质 |
-
2022
- 2022-07-28 CN CN202210900039.3A patent/CN117523226A/zh active Pending
-
2023
- 2023-07-05 WO PCT/CN2023/105843 patent/WO2024022060A1/zh unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110675423A (zh) * | 2019-08-29 | 2020-01-10 | 电子科技大学 | 一种基于孪生神经网络和注意力模型的无人机跟踪方法 |
CN112749602A (zh) * | 2019-10-31 | 2021-05-04 | 北京市商汤科技开发有限公司 | 目标查询方法、装置、设备及存储介质 |
WO2021115159A1 (zh) * | 2019-12-09 | 2021-06-17 | 中兴通讯股份有限公司 | 文字识别网络模型训练方法、文字识别方法、装置、终端及其计算机存储介质 |
CN111091576A (zh) * | 2020-03-19 | 2020-05-01 | 腾讯科技(深圳)有限公司 | 图像分割方法、装置、设备及存储介质 |
CN114529963A (zh) * | 2020-11-23 | 2022-05-24 | 中兴通讯股份有限公司 | 图像处理方法、装置、电子设备和可读存储介质 |
CN112560695A (zh) * | 2020-12-17 | 2021-03-26 | 中国海洋大学 | 水下目标跟踪方法、系统、存储介质、设备、终端及应用 |
CN113177916A (zh) * | 2021-04-21 | 2021-07-27 | 清华大学深圳国际研究生院 | 一种基于少样本学习方法的轻微高血压眼底辨别模型 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118365851A (zh) * | 2024-06-19 | 2024-07-19 | 中南大学 | 基于深度学习的图像对齐方法、系统、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN117523226A (zh) | 2024-02-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110149541B (zh) | 视频推荐方法、装置、计算机设备及存储介质 | |
US9396523B2 (en) | Image restoration cascade | |
JP7096888B2 (ja) | ネットワークモジュール、割り当て方法及び装置、電子機器並びに記憶媒体 | |
CN109766925B (zh) | 特征融合方法、装置、电子设备及存储介质 | |
CN107909583B (zh) | 一种图像处理方法、装置及终端 | |
CN112258404B (zh) | 图像处理方法、装置、电子设备和存储介质 | |
CN112862877B (zh) | 用于训练图像处理网络和图像处理的方法和装置 | |
JP7181375B2 (ja) | 目標対象の動作認識方法、装置及び電子機器 | |
CN108875931B (zh) | 神经网络训练及图像处理方法、装置、系统 | |
CN109598250B (zh) | 特征提取方法、装置、电子设备和计算机可读介质 | |
CN110211017B (zh) | 图像处理方法、装置及电子设备 | |
WO2024022060A1 (zh) | 一种图像配准方法、装置及存储介质 | |
WO2021143281A1 (zh) | 颜色阴影校正方法、终端设备及计算机可读存储介质 | |
CN109872362B (zh) | 一种目标检测方法及装置 | |
CN113569052B (zh) | 知识图谱的表示学习方法及装置 | |
WO2024179510A1 (zh) | 一种图像处理方法及相关装置 | |
CN112990440A (zh) | 用于神经网络模型的数据量化方法、可读介质和电子设备 | |
CN110490389B (zh) | 点击率预测方法、装置、设备及介质 | |
WO2022001364A1 (zh) | 一种提取数据特征的方法和相关装置 | |
CN116894802B (zh) | 图像增强方法、装置、计算机设备和存储介质 | |
CN110135329B (zh) | 从视频中提取姿势的方法、装置、设备及存储介质 | |
CN111402177A (zh) | 一种清晰度检测方法、系统、设备和介质 | |
CN116805282A (zh) | 图像超分辨率重建方法、模型训练方法、装置及电子设备 | |
CN110197459A (zh) | 图像风格化生成方法、装置及电子设备 | |
CN113658320A (zh) | 一种三维重建方法、人脸三维重建方法及其相关装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23845273 Country of ref document: EP Kind code of ref document: A1 |