CN117523226A

CN117523226A - Image registration method, device and storage medium

Info

Publication number: CN117523226A
Application number: CN202210900039.3A
Authority: CN
Inventors: 李楠宇; 陈日清; 徐宏; 余坤璋; 刘润南; 苏晨晖
Original assignee: Hangzhou Kunbo Biotechnology Co Ltd
Current assignee: Hangzhou Kunbo Biotechnology Co Ltd
Priority date: 2022-07-28
Filing date: 2022-07-28
Publication date: 2024-02-06
Also published as: WO2024022060A1

Abstract

The invention is applicable to the technical field of image processing, and provides an image registration method, an image registration device and a storage medium, wherein the method comprises the following steps: respectively carrying out feature extraction on a first image and a second image by adopting a feature extraction network to obtain a plurality of first feature images of the first image under different scales and a plurality of second feature images of the second image under different scales; sequentially inputting the extracted first feature map and the second feature map into an attention module for processing to obtain the first feature map and the second feature map after attention processing; and determining the similarity between the pixels contained in the first feature image and the second feature image which correspond to the same scale and are subjected to attention processing, and acquiring a matched feature point pair between the first image and the second image based on the determined similarity between the pixels, so that the accuracy of image registration is improved.

Description

Image registration method, device and storage medium

Technical Field

The present invention belongs to the technical field of image processing, and in particular, relates to an image registration method, an image registration device, and a storage medium.

Background

Current image inter-frame registration is generally performed by using conventional algorithms such as Scale-invariant feature transform (SIFT). However, with traditional algorithms such as SIFT, only a few small number of matching points can be found. However, it is often difficult to estimate the pose change condition of the corresponding image capturing apparatus with a small number of matching points, and the position and the pose change of the image capturing apparatus cannot be accurately obtained.

Disclosure of Invention

The invention aims to provide an image registration method, an image registration device and a storage medium, and aims to solve the problem of low accuracy of the existing interframe registration method.

In one aspect, the present invention provides an image registration method, the method comprising:

respectively carrying out feature extraction on a first image and a second image by adopting a feature extraction network to obtain a plurality of first feature images of the first image under different scales and a plurality of second feature images of the second image under different scales;

sequentially inputting the extracted first feature map and the second feature map into an attention module for processing to obtain the first feature map and the second feature map after attention processing;

and determining the similarity between the pixels contained in the first feature image and the second feature image which correspond to the same scale and are subjected to attention processing, and acquiring a matched feature point pair between the first image and the second image based on the determined similarity between the pixels.

In another aspect, the present invention provides an image registration apparatus, the apparatus comprising:

the feature extraction unit is used for carrying out feature extraction on the first image and the second image respectively by adopting a feature extraction network to obtain a plurality of first feature images of the first image under different scales and a plurality of second feature images of the second image under different scales;

The attention processing unit is used for sequentially inputting the extracted first feature map and the second feature map into the attention module for processing to obtain a first feature map and a second feature map after attention processing; and

the feature point pair acquisition unit is used for determining the similarity between the pixels contained in the first feature map and the second feature map which correspond to the same scale and are subjected to attention processing, and acquiring the matched feature point pair between the first image and the second image based on the determined similarity between the pixels.

In another aspect, the present invention further provides an electronic device, including: a memory and a processor;

the memory stores executable program code;

the processor, coupled to the memory, invokes the executable program code stored in the memory to perform the image registration method as provided by the above embodiments.

In another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an image registration method as provided by the above embodiments.

According to the invention, a feature extraction network is adopted to extract features of a first image and a second image respectively, a plurality of first feature images of the first image under different scales and a plurality of second feature images of the second image under different scales are obtained, the extracted first feature images and the second feature images are sequentially input into an attention module for processing, the first feature images and the second feature images after attention processing are obtained, the similarity between pixel points contained in the first feature images and the second feature images which correspond to the same scale and are subjected to attention processing is determined, and based on the determined similarity between the pixel points, the matched feature point pairs between the first image and the second image are obtained, so that the number of the obtained matched feature point pairs can be remarkably increased, and the accuracy of image registration is improved.

Drawings

FIG. 1 is a schematic diagram of matching feature point pairs of a bronchial image obtained by using a SIFT algorithm;

fig. 2 is a flowchart of an implementation of an image registration method according to an embodiment of the present application;

fig. 3A is a flowchart illustrating an implementation of an image registration method according to an embodiment of the present application;

FIG. 3B is a schematic diagram of matching feature point pairs of a bronchial image acquired using the image registration method described in embodiments of the present application;

FIG. 3C is a schematic diagram of matching feature point pairs of a bronchial image acquired using the image registration method described in embodiments of the present application;

fig. 3D is a schematic diagram of an attention module in an image registration method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an image registration apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an image registration apparatus according to an embodiment of the present application;

fig. 6 is a schematic hardware structure of an electronic device according to an embodiment of the disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The following describes in detail the implementation of the present invention in connection with specific embodiments:

at present, the interframe registration of images is generally calculated by adopting traditional algorithms such as Scale-invariant feature transform (Scale-invariant feature transform, SIFT) and the like, wherein, as shown in fig. 1, the image acquisition equipment is taken as a bronchoscope as an example, the SIFT algorithm is adopted to carry out the interframe registration of the front and back frame images acquired by the bronchoscope, only 3 pairs of matching points shown by connecting lines in fig. 1 can be obtained, and the pose change condition of the bronchoscope is difficult to estimate by utilizing the 3 pairs of matching points, so that the navigation accuracy of the bronchoscope is reduced.

Referring to fig. 2, an embodiment of the present invention provides a flow for implementing an image registration method, and for convenience of explanation, only a portion related to the embodiment of the present invention is shown, which is described in detail below:

in step S201, feature extraction is performed on the first image and the second image by using a feature extraction network, so as to obtain a plurality of first feature images of the first image under different scales and a plurality of second feature images of the second image under different scales;

the embodiment of the invention is suitable for an electronic device, which can be a mobile phone, a tablet computer, wearable equipment, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA) and other equipment, and the embodiment of the invention does not limit the specific type of the electronic device.

In the embodiment of the present invention, the first image and the second image may be images acquired by an image acquisition apparatus, wherein the image acquisition apparatus may be an apparatus having an image acquisition function, for example, a bronchoscope, a camera, a video camera, or the like, which is not limited in this specification.

In the embodiment of the invention, the first image and the second image are respectively subjected to feature extraction by adopting a feature extraction network to obtain a plurality of first feature images of the first image under different scales and a plurality of second feature images of the second image under different scales. The first image and the second image are two frames of images which need to be extracted or registered by matching point pairs. Further, the first image and the second image are adjacent frame images photographed by a bronchoscope so as to estimate the pose of the bronchoscope. Of course, the first image and the second image may also be images of a preset duration of intervals acquired by the bronchoscope, and the preset duration may be set according to actual requirements, which is not limited in this specification.

In the embodiment of the invention, the feature extraction network can be adopted to extract features of multiple scales such as shallow layers, deep layers and the like from the first image, wherein the convolution times of the first feature image belonging to the shallow layer scale are less, and the receptive field is smaller, so that the first feature image belonging to the shallow layer scale generally contains more texture features of the first image; the first feature map belonging to the deep scale typically contains more semantic information of the first image, which may include shape features, because the first feature map belonging to the deep scale is subjected to a higher number of convolutions and a larger receptive field. The process of extracting the multi-scale features of the shallow layer, the deep layer and the like from the second image by using the feature extraction network is similar to the process of extracting the features of the first image, and is not repeated here.

In step S202, the extracted first feature map and second feature map are sequentially input into an attention module for processing, so as to obtain a first feature map and a second feature map after attention processing;

in the embodiment of the invention, in order to pay more attention to key information which is favorable for image registration, and remove interference of noise, the extracted first feature map and second feature map are sequentially input into the attention module for processing, so that the first feature map and the second feature map after attention processing are obtained.

In step S203, the similarity between the pixels included in the first feature map and the second feature map, which correspond to the same scale and are subjected to the attention processing, is determined, and a matching feature point pair between the first image and the second image is acquired based on the determined similarity between the pixels.

According to the embodiment of the invention, the first image and the second image are respectively subjected to feature extraction by adopting the feature extraction network to obtain a plurality of first feature images of the first image under different scales and a plurality of second feature images of the second image under different scales, the extracted first feature images and the extracted second feature images are sequentially input into the attention module for processing, the first feature images and the second feature images after the attention processing are obtained, the similarity between pixel points contained in the first feature images and the second feature images which correspond to the same scale and are subjected to the attention processing is determined, and the matching feature point pairs between the first image and the second image are obtained based on the determined similarity between the pixel points, so that the accuracy of image registration is improved through the attention processing of the multi-scale feature images and the feature images.

Referring to fig. 3A, an embodiment of the present invention provides a flow of implementing an image registration method, and for convenience of explanation, only a portion related to the embodiment of the present invention is shown, which is described in detail below:

in step S301, a feature extraction network is used to perform feature extraction on a first image and a second image, so as to obtain a plurality of first feature images of the first image under different scales and a plurality of second feature images of the second image under different scales;

in the embodiment of the invention, the first image and the second image are respectively subjected to feature extraction by adopting a feature extraction network to obtain a plurality of first feature images of the first image under different scales and a plurality of second feature images of the second image under different scales. The first image and the second image are two frames of images which need to be extracted by matching point pairs. Further, the first image and the second image are adjacent frame images photographed by a bronchoscope so as to estimate the pose of the bronchoscope. Of course, the first image and the second image may also be images of a preset duration of intervals acquired by the bronchoscope, and the preset duration may be set according to actual requirements, which is not limited in this specification.

In a specific embodiment of the present application, the feature extraction network comprises a twin neural network to perform feature extraction on the first image and the second image, respectively. Furthermore, the twin neural network comprises a first sub-network and a second sub-network, the first sub-network and the second sub-network adopt the same or similar convolution network structures, when multi-scale features of the first image and the second image are extracted through the twin neural network, multi-scale feature extraction is carried out on the first image through the first sub-network, a plurality of first feature images of the first image under different scales are obtained, multi-scale feature extraction is carried out on the second image through the second sub-network, and a plurality of second feature images of the second image under different scales are obtained. More specifically, the first plurality of feature maps includes an output feature map of a last N convolutions blocks of the first sub-network and the second plurality of feature maps includes an output feature map of a last N convolutions blocks of the second sub-network.

In a preferred embodiment of the present application, the twin neural network includes a first sub-network and a second sub-network, and the first sub-network and the second sub-network are constructed based on a ResNet network, where a ResNet18 neural network architecture may be adopted, and a neural network architecture with a larger number of layers, such as ResNet34 or ResNet101, may also be adopted. And assuming that the ResNet networks in the first sub-network and the second sub-network both comprise five convolution blocks, the extracted multiple first feature maps comprise output feature maps of a third convolution block, a fourth convolution block and a fifth convolution block of the first sub-network, and the extracted multiple second feature maps comprise output feature maps of the third convolution block, the fourth convolution block and the fifth convolution block of the second sub-network, so that multi-scale feature extraction of different frame images of the first image and the second image is realized through the ResNet network, and the extraction accuracy of subsequent matching feature point pairs can be improved. As an example, five convolution blocks included in the ResNet network may be represented as conv1_x, conv2_x, conv3_x, conv4_x, conv5_x, respectively, a first feature map of a first image extracted by the 3 convolution blocks conv3_x, conv4_x, conv5_x of the first sub-network may be represented by conv3_xa, conv4_xa, conv5_xa, respectively, and a second feature map of a second image extracted by the 3 convolution blocks conv3_x, conv4_x, conv5_x of the second sub-network may be represented by conv3_xb, conv4_xb, conv5_xb, respectively.

Optionally, in another embodiment of the present application, after obtaining, in step S301, a plurality of first feature maps of the first image at different scales and a plurality of second feature maps of the second image at different scales, a dimension adjustment module may be used to perform dimension adjustment on the obtained first feature maps and second feature maps corresponding to different scales, so that the obtained first feature maps and second feature maps are consistent in space dimension and channel dimension, which is convenient for splicing or performing feature interaction operation on the first feature maps or the second feature maps corresponding to the same scale, and is also convenient for processing of a subsequent attention module. Specifically, the channel dimensions of the first and second feature maps may be unified by one-dimensional convolution, and the spatial dimensions of the first and second feature maps may be unified by deconvolution.

In step S302, inputting any one of the first feature map and the second feature map into the channel attention module to perform a channel attention operation, and inputting a result of the channel attention operation into the spatial attention module to perform a spatial attention operation, so as to obtain any one of the corresponding feature maps after the attention processing;

In the embodiment of the invention, the attention module comprises a channel attention module and a space attention module, and the channel attention module is arranged before the space attention module, and at this time, in the process of respectively inputting the extracted first characteristic diagram and second characteristic diagram into the attention module for processing: the extracted first feature map may be input to a channel attention module, then a channel attention operation for the first feature map may be performed by the channel attention module, and the obtained result of the channel attention operation corresponding to the first feature map may be input to a spatial attention module, and a spatial attention operation for the result of the aforementioned channel attention operation may be performed by the spatial attention module, so that a corresponding attention-processed first feature map may be obtained; also, the extracted second feature map may be input to the channel attention module, then a channel attention operation for the second feature map may be performed by the channel attention module, and the obtained result of the channel attention operation corresponding to the second feature map may be input to the spatial attention module, and a spatial attention operation for the result of the aforementioned channel attention operation may be performed by the spatial attention module, so that a corresponding attention-processed second feature map may be obtained.

Specifically, the channel attention module may include a first global pooling layer, a first one-dimensional convolution layer, and a first coefficient calculation layer, where when any one of the first feature map and the second feature map is input to the channel attention module to perform a channel attention operation, the method includes: the maximum value of any feature map in the space dimension of the first feature map and the second feature map is calculated through the first global pooling layer, the dimension of the any feature map can be reduced, overfitting is reduced, a corresponding third feature map is obtained, one-dimensional convolution calculation is carried out on the channel dimension of the third feature map through the first one-dimensional convolution layer, normalization processing is carried out on the feature map after the one-dimensional convolution calculation through the first coefficient calculation layer, a channel attention coefficient is obtained, finally any input feature map is processed through the obtained channel attention coefficient, and a channel attention operation result is obtained, so that the channel attention calculation accuracy is improved, for example, the channel attention coefficient is multiplied with any feature map in the first feature map and the second feature map of the input channel attention module, and any feature map after the channel attention operation is obtained. More specifically, assuming that the dimension X of the input feature map of the channel attention module is h×w×c, where H represents the height of the feature map, W represents the width of the feature map, C represents the number of channels, calculating the maximum K values of X in the dimension h×w by performing a maximum K pooling operation to obtain an output Z, where the dimension Z is c×k, performing one-dimensional convolution calculation on the dimension C and one-dimensional convolution calculation on the dimension K respectively, and finally performing normalization by a Softmax function to obtain a channel attention coefficient, and multiplying the channel attention coefficient by the original feature map X to obtain a new feature map, where the new feature map is the output of the channel attention module, that is, the input of the spatial attention module.

Specifically, the spatial attention module may include a second global pooling layer, a second one-dimensional convolution layer, and a second coefficient calculation layer connected in sequence, and when a result of the channel attention operation is input to the spatial attention module to perform the spatial attention operation, includes: and calculating the maximum value of the input feature map of the spatial attention module in the channel dimension through the second global pooling layer to obtain a corresponding fourth feature map, carrying out one-dimensional convolution calculation on the spatial dimension of the fourth feature map through the second one-dimensional convolution layer, normalizing the features subjected to one-dimensional convolution calculation through the second coefficient calculation layer to obtain a spatial attention coefficient, processing the input feature map of the spatial attention module by using the spatial attention coefficient to obtain a corresponding first feature map or a corresponding second feature map subjected to attention processing, thereby improving the accuracy of the spatial attention calculation, for example, multiplying the spatial attention coefficient with the input feature map of the spatial attention module to obtain a corresponding first feature map or a corresponding second feature map subjected to attention processing. More specifically, assuming that an input feature map of the spatial attention module is represented by X ', the dimension of X' is h×w×c, firstly calculating the maximum value of X 'in the dimension of C through global pooling operation to obtain the dimension of output Z', and respectively performing one-dimensional convolution calculation on two dimensions H and W of Z 'through two one-dimensional convolutions, finally normalizing the features after one-dimensional convolution calculation through a Softmax function to obtain a spatial attention coefficient, and multiplying the spatial attention coefficient by an original feature X', thereby obtaining a new feature output, namely a first feature map or a second feature map after attention processing. As an example, the structure of the attention module may refer to fig. 3D.

Optionally, in another embodiment of the present application, if the attention module includes a channel attention module and a spatial attention module, and the spatial attention module is disposed before the channel attention module, when the extracted first feature map and the second feature map are sequentially input into the attention module to be processed, and the first feature map and the second feature map after the attention process are obtained, any one of the first feature map and the second feature map is input into the spatial attention module to execute the spatial attention operation, and a result of the spatial attention operation is input into the channel attention module to execute the channel attention operation, so as to obtain the corresponding first feature map or the second feature map after the attention process. The structures of the channel attention module and the spatial attention module are similar to those of the foregoing embodiments, and will not be repeated here.

The channel attention module is arranged before the spatial attention module, so that a more accurate first feature map or a second feature map after attention processing can be obtained, and the obtained first feature map or second feature map after attention processing focuses on key information in an image, thereby being beneficial to extracting more matching feature point pairs later and extracting more accurate matching feature point pairs.

In step S303, the similarity between the pixels included in the first feature map and the second feature map, which correspond to the same scale and are subjected to attention processing, is determined, and a matching feature point pair between the first image and the second image is acquired based on the determined similarity between the pixels.

In the embodiment of the application, specifically, when determining the similarity between the pixel points contained in the first feature map and the second feature map corresponding to the same scale, any target feature map pair is obtained, wherein each target feature map pair comprises a first feature map and a second feature map which belong to the same scale and are subjected to attention processing; executing feature interaction operation aiming at the target feature map pair to obtain an interaction feature map corresponding to any target feature map pair; inputting the interaction characteristic diagram into a first convolution network trained in advance, and obtaining a separation result output by the first convolution network, wherein the separation result comprises a first separation characteristic diagram corresponding to the first characteristic diagram, a second separation characteristic diagram corresponding to the second characteristic diagram and the similarity between pixel points contained in the first separation characteristic diagram and the second separation characteristic diagram. By executing the feature interaction operation, the accuracy of the similarity between the pixel points contained in the determined first feature map and the determined second feature map corresponding to the same scale can be effectively improved.

In the embodiment of the invention, the interaction feature map corresponding to any target feature map pair is a four-dimensional tensor, and the first convolution network is a four-dimensional convolution network. By way of example, taking the ResNet network described above as an example, assuming that the feature map after the above-described attention computation is represented by conv3_xA ', conv4_xA', conv5_xA ', and conv3_xB', conv4_xB ', conv5_xB', and conv3_xA 'and conv3_xB' are feature interacted, and conv4_xA 'and conv4_xB' and conv5_xB 'are feature interacted, the interaction feature map obtained after the feature interaction can be represented by conv3AB, conv4AB, and conv5AB, wherein conv3AB=conv3_xA' ^T conv3_xB’，conv4AB＝conv4_xA’ ^T conv4_xB’，conv5AB＝conv5_xA’ ^T The dimensions of conv5_xb', conv3AB, conv4AB and conv5AB are all h×w×h×w, i.e. are all four-dimensional tensors.

When the interactive feature map is input into the first convolution network trained in advance to obtain the separation result output by the first convolution network, preferably, four-dimensional convolution calculation is performed on each interactive feature map, and feature separation is performed on the feature map after the four-dimensional convolution calculation. Taking the above-mentioned interactive feature graphs conv3AB, conv4AB, and conv5AB as examples, performing four-dimensional convolution computation on conv3AB, conv4AB, and conv5AB respectively to capture feature points between the first image and the second image, performing the four-dimensional convolution computation to obtain feature points with dimensions of 2×w, separating the feature points output after the four-dimensional convolution computation, separating conv3AB into conv3A and conv3B, separating conv4AB into conv4A and conv4B, performing feature separation to obtain feature points with dimensions of h×w, and performing inverse convolution and linear interpolation to obtain h×w of each feature point in the feature points and length and width correspondence between the first image and the second image, thereby determining similarity between pixel points at the same positions in the feature points after separation.

Further, when the matching feature point pairs between the first image and the second image are obtained based on the determined similarity between the pixel points, which are contained in the first separation feature image and the second separation feature image and have the corresponding similarity not smaller than the set threshold, are determined as the matching feature point pairs, so that the pixel points with the larger corresponding similarity can be screened out to serve as the matching feature point pairs, all the pixel points do not need to serve as the matching feature point pairs, and the calculation force resources of a computer can be saved.

Taking the bronchial image described in the background art as an example, a large number of matching feature point pairs can be extracted by using the method described in this embodiment, and the obtained matching feature point pairs are shown in fig. 3B, and fig. 3C shows matching feature points obtained in another example of image processing. As can be seen from fig. 1, 3B and 3C, the matching feature point pairs extracted by the method described in this embodiment are more abundant than the matching feature point pairs extracted by the SIFT algorithm. Then the accuracy of the determined pose of the bronchoscope can be improved by utilizing the matched characteristic point pairs shown in fig. 3B, which is beneficial to the subsequent navigation operation for the bronchoscope. The change condition of the pose data of the bronchoscope in the process from the acquisition of the previous frame image to the acquisition of the subsequent frame image can be determined based on the change information of the matching feature point pair in the position data of different frame images, and the method is favorable for accurately predicting the current pose of the bronchoscope.

Optionally, in another embodiment of the present application, when determining the similarity between the pixels included in the first feature map and the second feature map corresponding to the same scale, selecting the first feature map and the second feature map which belong to the same scale and are subjected to attention processing, directly inputting the selected first feature map and the selected second feature map into the second convolutional network trained in advance, so as to obtain the similarity between the selected first feature map and the pixels included in the selected second feature map output by the second convolutional network, thereby improving the simplicity of obtaining the similarity between the pixels included in the first feature map and the second feature map corresponding to the same scale, and improving the processing efficiency.

Referring to fig. 4, an image registration apparatus according to an embodiment of the present invention is provided with a structure, for convenience of explanation, of which only a portion related to the embodiment of the present invention is shown, including:

a feature extraction unit 41, configured to perform feature extraction on the first image and the second image by using a feature extraction network, so as to obtain a plurality of first feature images of the first image under different scales and a plurality of second feature images of the second image under different scales;

an attention processing unit 42, configured to sequentially input the extracted first feature map and second feature map into an attention module for processing, and obtain a first feature map and a second feature map after attention processing; and

And a feature point pair obtaining unit 43, configured to determine a similarity between pixels included in the first feature map and the second feature map, which correspond to the same scale and are subjected to attention processing, and obtain a matching feature point pair between the first image and the second image based on the determined similarity between pixels.

According to the embodiment of the invention, the first image and the second image are respectively subjected to feature extraction by adopting the feature extraction network to obtain a plurality of first feature images of the first image under different scales and a plurality of second feature images of the second image under different scales, the extracted first feature images and the extracted second feature images are sequentially input into the attention module for processing, the first feature images and the second feature images after the attention processing are obtained, the similarity between pixel points contained in the first feature images and the second feature images which correspond to the same scale and are subjected to the attention processing is determined, and the matching feature point pairs between the first image and the second image are obtained based on the determined similarity between the pixel points, so that the accuracy of image registration is improved.

Referring to fig. 5, the structure of the image registration apparatus according to an embodiment of the present invention is shown for convenience of explanation only in the parts related to the embodiment of the present invention, including:

A feature extraction unit 51, configured to perform feature extraction on the first image and the second image by using a feature extraction network, so as to obtain a plurality of first feature images of the first image under different scales and a plurality of second feature images of the second image under different scales;

an attention processing unit 52, configured to sequentially input the extracted first feature map and second feature map into an attention module for processing, and obtain a first feature map and a second feature map after attention processing; and

and a feature point pair obtaining unit 53, configured to determine a similarity between pixels included in the first feature map and the second feature map that correspond to the same scale and are subjected to attention processing, and obtain a matching feature point pair between the first image and the second image based on the determined similarity between pixels.

Specifically, the feature extraction network includes a twin neural network including a first sub-network and a second sub-network that are identical in structure and share weights, and the feature extraction unit 51 includes:

a first image obtaining unit 511, configured to process the first image with a first sub-network, and obtain a plurality of first feature maps of the first image under different scales; and

A second image obtaining unit 512, configured to process the second image using a second sub-network, and obtain a plurality of second feature maps of the second image under different scales.

Further, the first image and the second image are adjacent frame images captured by a bronchoscope.

Specifically, the attention module includes a channel attention module and a spatial attention module, and when the channel attention module is disposed before the spatial attention module, the attention processing unit 52 includes:

the first processing unit 521 is configured to input any one of the first feature map and the second feature map to the channel attention module to perform a channel attention operation, and input a result of the channel attention operation to the spatial attention module to perform a spatial attention operation, so as to obtain any one of the corresponding feature maps after the attention processing.

Further, the channel attention module includes a first global pooling layer, a first one-dimensional convolution layer, and a first coefficient calculation layer, and the first processing unit 521 includes:

the first maximum value calculation unit is used for calculating the maximum value of any one of the first feature map and the second feature map in the space dimension through the first global pooling layer to obtain a corresponding third feature map;

The first convolution calculation unit is used for carrying out one-dimensional convolution calculation on the channel dimension of the third feature map through the first one-dimensional convolution layer;

the first normalization unit is used for performing normalization processing on the feature map subjected to one-dimensional convolution calculation through the first coefficient calculation layer to obtain a channel attention coefficient; and

and the first multiplication unit is used for processing any input characteristic diagram by using the obtained channel attention coefficient.

Further, the spatial attention module includes a second global pooling layer, a second one-dimensional convolution layer, and a second coefficient calculation layer connected in sequence, and the first processing unit 521 further includes:

the second maximum value calculation unit is used for calculating the maximum value of the input feature map of the spatial attention module in the channel dimension through the second global pooling layer to obtain a corresponding fourth feature map;

the second convolution calculation unit is used for carrying out one-dimensional convolution calculation on the space dimension of the fourth feature map through the second one-dimensional convolution layer;

the second normalization unit is used for normalizing the features subjected to one-dimensional convolution calculation through the second coefficient calculation layer to obtain a spatial attention coefficient; and

and the second multiplying unit is used for processing the input characteristic diagram of the spatial attention module by using the spatial attention coefficient.

Optionally, in another embodiment of the present application, if the attention module includes a channel attention module and a spatial attention module, and the spatial attention module is disposed before the channel attention module, the attention processing unit 52 includes:

and the second processing unit is used for inputting any one of the first characteristic diagram and the second characteristic diagram into the spatial attention module so as to execute spatial attention operation, inputting the result of the spatial attention operation into the channel attention module so as to execute channel attention operation, and obtaining any corresponding characteristic diagram after attention processing.

Further, the image registration apparatus further includes:

the dimension adjustment unit is used for carrying out dimension adjustment on the acquired first feature map and the acquired second feature map which correspond to different dimensions by adopting the dimension adjustment module so as to enable the first feature map and the second feature map which are input into the attention module to be consistent in space dimension and channel dimension.

In an embodiment of the present application, specifically, the feature point pair acquisition unit 53 includes:

the system comprises a feature map pair acquisition unit, a feature map processing unit and a feature map processing unit, wherein the feature map pair acquisition unit is used for acquiring any target feature map pair, and each target feature map pair comprises a first feature map and a second feature map which belong to the same scale and are subjected to attention processing;

The feature interaction unit is used for executing feature interaction operation aiming at the target feature map pair to obtain an interaction feature map corresponding to any target feature map pair;

the feature map separation unit is used for inputting the interaction feature map into a first convolution network trained in advance, and obtaining a separation result output by the first convolution network, wherein the separation result comprises a first separation feature map corresponding to the first feature map, a second separation feature map corresponding to the second feature map, and similarity between pixel points contained in the first separation feature map and the second separation feature map.

Further, the feature point pair acquisition unit 53 further includes:

and the characteristic point pair determining unit is used for determining the pixel points, the corresponding similarity of which is not smaller than a set threshold value, contained in the first separation characteristic diagram and the second separation characteristic diagram as matching characteristic point pairs.

In another embodiment of the present application, specifically, the feature point pair acquisition unit 53 includes:

the feature map selecting unit is used for selecting a first feature map and a second feature map which belong to the same scale and are subjected to attention processing; and

the similarity obtaining unit is used for inputting the selected first feature map and the second feature map into a pre-trained second convolution network to obtain the similarity between the selected first feature map and the pixel points contained in the selected second feature map, which are output by the second convolution network.

In the embodiment of the present invention, each unit or module of the image registration apparatus may be implemented by corresponding hardware or software units or modules, and each unit or module may be an independent software or hardware unit or module, or may be integrated into one software or hardware unit or module, which is not limited to the present invention. The specific implementation of each unit or module of the image registration apparatus may refer to the description of the foregoing method embodiment, and will not be repeated herein.

Referring to fig. 6, a hardware structure of an electronic device according to an embodiment of the present application is shown.

By way of example, the electronic apparatus may be any of various types of computer system devices that are non-removable or portable and that perform wireless or wired communications. In particular, the electronic apparatus may be a desktop computer, a server, a mobile phone or a smart phone (e.g., an iPhone-based TM, an Android-based TM phone), a Portable game device (e.g., a Nintendo DS (TM), a PlayStation Portable TM, gameboy Advance TM, iPhone (TM)), a laptop computer, a PDA, a Portable internet device, a Portable medical device, a smart camera, a music player, and a data storage device, other handheld devices, and devices such as watches, headphones, pendants, headphones, etc., and the electronic apparatus may also be other wearable devices (e.g., devices such as electronic glasses, electronic clothing, electronic bracelets, electronic necklaces, and other head-mounted devices (HMDs)).

As shown in fig. 6, the electronic device 6 may include a control circuit, which may include a storage and processing circuit 61. The storage and processing circuit 61 may include memory, such as hard drive memory, non-volatile memory (e.g., flash memory or other electronically programmable limited delete memory used to form solid state drives, etc.), volatile memory (e.g., static or dynamic random access memory, etc.), and the like, as embodiments of the present application are not limited. The processing circuitry in the storage and processing circuitry 61 may be used to control the operation of the electronic device 6. The processing circuitry may be implemented based on one or more microprocessors, microcontrollers, digital signal processors, baseband processors, power management units, audio codec chips, application specific integrated circuits, display driver integrated circuits, and the like.

The storage and processing circuitry 61 may be used to run software in the electronic device 6 such as internet browsing applications, voice over internet protocol (Voice over Internet Protocol, VOIP) telephone call applications, email applications, media playing applications, operating system functions, and the like. Such software may be used to perform some control operations, such as image acquisition based on a camera, ambient light measurement based on an ambient light sensor, proximity sensor measurement based on a proximity sensor, information display functions implemented based on status indicators such as status indicators of light emitting diodes, touch event detection based on a touch sensor, functions associated with displaying information on multiple (e.g., layered) displays, operations associated with performing wireless communication functions, operations associated with collecting and generating audio signals, control operations associated with collecting and processing button press event data, and other functions in electronic device 6, to name a few.

Further, the memory stores executable program code, and a processor coupled to the memory invokes the executable program code stored in the memory to perform the image registration method as described in the foregoing embodiments, such as: the method described in steps S201-S203 in fig. 2.

Wherein the executable program code comprises individual units or modules of the image registration apparatus as described in the previous embodiments, such as: modules 41-43 in fig. 4. Specific processes for implementing the respective functions by the units or modules are described in the embodiments of the image registration apparatus, and are not described herein.

Further, the embodiments of the present application also provide a non-transitory computer-readable storage medium, which may be configured in the server in the above embodiments, and on which a computer program is stored, which when executed by a processor, implements the image registration method described in the foregoing embodiments of the image registration method.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of skill in the art will appreciate that the various illustrative modules/units and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal and method may be implemented in other manners. For example, the apparatus/terminal embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow in the method of the above embodiment, and may also be implemented by a computer program to instruct related hardware. The computer program may be stored in a computer readable storage medium, which computer program, when being executed by a processor, may carry out the steps of the various method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A method of image registration, the method comprising:

2. The method of claim 1, wherein the feature extraction network comprises a twin neural network; the twin neural network comprises a first sub-network and a second sub-network which are identical in structure and share weights; the step of extracting the features of the first image and the second image by adopting the feature extraction network respectively comprises the following steps:

processing the first image by using the first sub-network to obtain a plurality of first feature images of the first image under different scales;

and processing the second image by using the second sub-network to obtain a plurality of second feature images of the second image under different scales.

3. The method of claim 1, wherein the attention module includes a channel attention module and a spatial attention module, and the step of sequentially inputting the extracted first and second feature maps to the attention module for processing when the channel attention module is disposed before the spatial attention module, and obtaining the first and second feature maps after the attention processing includes:

inputting any one of the first feature map and the second feature map into the channel attention module to execute channel attention operation, and inputting the result of the channel attention operation into the space attention module to execute space attention operation, so as to obtain the corresponding any one of the feature maps after attention processing.

4. The method of claim 3, wherein the channel attention module comprises a first global pooling layer, a first one-dimensional convolution layer, and a first coefficient calculation layer;

inputting either one of the first and second feature maps into the channel attention module to perform a channel attention operation, comprising;

calculating the maximum value of any one of the first feature map and the second feature map in the space dimension through the first global pooling layer to obtain a corresponding third feature map;

performing one-dimensional convolution calculation on the channel dimension of the third feature map through the first one-dimensional convolution layer;

normalizing the feature map subjected to one-dimensional convolution calculation through the first coefficient calculation layer to obtain a channel attention coefficient;

and processing any input characteristic diagram by using the obtained channel attention coefficient.

5. A method according to claim 3, wherein the spatial attention module comprises a second global pooling layer, a second one-dimensional convolution layer and a second coefficient calculation layer connected in sequence;

inputting a result of the channel attention operation into the spatial attention module to perform a spatial attention operation, comprising:

Calculating the maximum value of the input feature map of the spatial attention module in the channel dimension through the second global pooling layer to obtain a corresponding fourth feature map;

performing one-dimensional convolution calculation on the space dimension of the fourth feature map through the second one-dimensional convolution layer;

normalizing the features subjected to one-dimensional convolution calculation through the second coefficient calculation layer to obtain a spatial attention coefficient;

and processing the input characteristic diagram of the spatial attention module by utilizing the spatial attention coefficient.

6. The method of claim 1, wherein the attention module includes a channel attention module and a spatial attention module, and the step of sequentially inputting the extracted first and second feature maps to the attention module for processing when the spatial attention module is disposed before the channel attention module, and obtaining the first and second feature maps after the attention processing includes:

and inputting any one of the first characteristic diagram and the second characteristic diagram into the space attention module to execute space attention operation, and inputting the result of the space attention operation into the channel attention module to execute channel attention operation so as to obtain the corresponding any one of the characteristic diagrams after attention processing.

7. The method of claim 1, wherein the method further comprises:

and carrying out dimension adjustment on the acquired first feature map and the acquired second feature map which correspond to different dimensions by adopting a dimension adjustment module so as to enable the first feature map and the second feature map which are input into the attention module to be consistent in space dimension and channel dimension.

8. The method of claim 1, wherein the step of determining the similarity between the pixels included in the first feature map and the second feature map, which correspond to the same scale and are subjected to the attention process, comprises:

any target feature map pair is obtained, wherein each target feature map pair comprises a first feature map and a second feature map which belong to the same scale and are subjected to attention processing;

executing feature interaction operation aiming at the target feature map pair to obtain an interaction feature map corresponding to any target feature map pair;

inputting the interaction characteristic diagram into a first convolution network trained in advance, and obtaining a separation result output by the first convolution network, wherein the separation result comprises a first separation characteristic diagram corresponding to the first characteristic diagram, a second separation characteristic diagram corresponding to the second characteristic diagram, and the similarity between pixel points contained in the first separation characteristic diagram and the second separation characteristic diagram.

9. The method of claim 8, wherein the step of obtaining a matching pair of feature points between the first image and the second image based on the determined similarity between the pixels comprises:

and determining the pixel points, the similarity of which is not smaller than a set threshold value, contained in the first separation characteristic diagram and the second separation characteristic diagram as the matching characteristic point pairs.

10. The method of claim 1, wherein the step of determining the similarity between the pixels included in the first feature map and the second feature map, which correspond to the same scale and are subjected to the attention process, comprises:

selecting a first feature map and a second feature map which belong to the same scale and are subjected to attention treatment;

inputting the selected first characteristic diagram and the second characteristic diagram into a pre-trained second convolution network, and obtaining the similarity between the selected first characteristic diagram and the pixel points contained in the selected second characteristic diagram, which are output by the second convolution network.

11. The method of claim 1, wherein the first image and the second image are adjacent frame images captured by a bronchoscope.

12. An image registration apparatus, the apparatus comprising:

13. An electronic device includes a memory and a processor;

the memory stores executable program code;

the processor coupled to the memory, invoking the executable program code stored in the memory, performing the method of any of claims 1-11.

14. A non-transitory computer readable storage medium, having stored thereon a computer program, which, when executed by a processor, implements the method according to any of claims 1 to 11.