WO2022242713A1 - 一种图像对齐方法及装置 - Google Patents

一种图像对齐方法及装置 Download PDF

Info

Publication number
WO2022242713A1
WO2022242713A1 PCT/CN2022/093799 CN2022093799W WO2022242713A1 WO 2022242713 A1 WO2022242713 A1 WO 2022242713A1 CN 2022093799 W CN2022093799 W CN 2022093799W WO 2022242713 A1 WO2022242713 A1 WO 2022242713A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
target
features
image
offset
Prior art date
Application number
PCT/CN2022/093799
Other languages
English (en)
French (fr)
Inventor
董航
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2022242713A1 publication Critical patent/WO2022242713A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/337Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular to an image alignment method and device.
  • Image alignment refers to the process of determining the change parameters between the reference image and the target image, and deforming the target image into the same spatial layout as the reference image according to the change parameters.
  • Image alignment is widely used in video repair, image fusion, image stitching, object recognition and other fields. For example: in video restoration, by aligning adjacent image frames, the information between adjacent image frames can be effectively used to obtain more detailed information of image frames, thereby obtaining clearer and more detailed videos.
  • the traditional image alignment method is: calculate the optical flow (Optical Flow) field between the target image and the reference image, and use the optical flow field as the dense registration relationship between the target image and the reference image, and finally through the reverse grid deformation (Back -warping) to align the target image to the reference image.
  • optical flow Optical Flow
  • Back -warping reverse grid deformation
  • an embodiment of the present disclosure provides an image alignment method, including:
  • the target features include feature points corresponding to pixels in the target image
  • the reference features include feature points corresponding to pixels in the reference image
  • the similarity feature includes: the similarity between each feature point in the target feature and the corresponding related feature point, and the feature point in the target feature
  • the corresponding relevant feature points include: the pixel coordinates of the reference feature and the pixel coordinates of the feature points in the target feature are the same as and adjacent to the feature point;
  • the relevant feature points corresponding to the first feature point in the target feature include: the second feature point and the neighborhood of the first preset value of the second feature point feature points;
  • the second feature point is a feature point whose pixel coordinates in the reference feature are the same as those of the first feature point.
  • the obtaining the similarity feature according to the target feature and the reference feature includes:
  • the second spatial domain being the feature points within the neighborhood of the fourth feature point and a second preset value of the fourth feature point
  • the formed spatial domain; the fourth feature point is a related feature point corresponding to the third feature;
  • the feature group includes feature points belonging to the first spatial domain and feature points belonging to the second spatial domain, and the The positions of the feature points belonging to the first spatial domain in the first spatial domain are the same as the positions of the feature points belonging to the second spatial domain in the second spatial domain;
  • each feature group is summed to obtain the similarity between the third feature point and the fourth feature point.
  • the target feature is a feature obtained by feature extraction of pixels in the target image
  • the reference feature is a feature obtained by performing feature extraction on pixels in the reference image Extract the acquired features
  • the target feature is a feature obtained by extracting features from pixels in the target image and downsampling the extracted extraction at a preset downsampling rate;
  • the reference feature is to extract pixels in the reference image Features are extracted and the extracted features are obtained by downsampling the extracted features at the preset downsampling rate.
  • the convolutional layer is predicted according to the similarity feature, the target feature, and an offset to obtain an offset between the target feature and the reference feature, include:
  • the output of the offset prediction convolutional layer is acquired as the offset between the target feature and the reference feature.
  • the target feature includes sub-target features of multiple spatial scales
  • the reference feature includes sub-reference features of multiple spatial scales
  • the similarity feature includes multiple spatial scales of sub-reference features.
  • the step of predicting the convolutional layer according to the similarity feature, the target feature, and the offset to obtain the offset of the target feature and the reference feature includes:
  • the sub-target features of the multiple spatial scales predict the convolutional layer, and obtain the sub-targets of the multiple spatial scales
  • the suboffset of the target feature and subreference features are the sub-target features of the multiple spatial scales.
  • the aligning the reference image with the target image according to the offset and the deformable convolution layer includes:
  • the reference image is aligned with the target image according to the alignment result of the reference feature and the target feature.
  • the target feature includes sub-target features of multiple spatial scales
  • the reference feature includes sub-reference features of multiple spatial scales
  • the offset includes multiple spatial scales the suboffset of the scale
  • the aligning the reference image with the target image according to the offset and the deformable convolutional layer includes:
  • the reference image is the nth image frame of the video to be repaired, and the reference image is the n+1th image frame of the video to be repaired; n is positive integer.
  • an image alignment device including:
  • a feature acquisition unit configured to acquire target features and reference features;
  • the target features include feature points corresponding to pixels in the target image, and the reference features include feature points corresponding to pixels in the reference image;
  • a similarity acquisition unit configured to acquire a similarity feature according to the target feature and the reference feature;
  • the similarity feature includes: the similarity between each feature point in the target feature and a corresponding related feature point, so
  • the relevant feature points corresponding to the feature points in the target feature include: feature points whose pixel coordinates in the reference feature are the same as and adjacent to the pixel coordinates of the feature points in the target feature;
  • An offset acquisition unit configured to predict the convolutional layer according to the similarity feature, the target feature, and the offset, and acquire the offset of the target feature and the reference feature;
  • a processing unit configured to align the reference image with the target image according to the offset and the deformable convolution layer.
  • the relevant feature points corresponding to the first feature point in the target feature include: the second feature point and the neighborhood of the first preset value of the second feature point feature points;
  • the second feature point is a feature point whose pixel coordinates in the reference feature are the same as those of the first feature point.
  • the similarity acquiring unit is specifically configured to determine a first spatial domain corresponding to a third feature point in the target feature, and the first spatial domain is the The spatial domain formed by the third feature point and the feature points within the neighborhood of the second preset value of the third feature point; determine the second spatial domain corresponding to the fourth feature point in the reference feature, the second The spatial domain is the spatial domain formed by the fourth feature point and the feature points within the neighborhood of the second preset value of the fourth feature point; the fourth feature point is a related feature point corresponding to the third feature ; calculate the outer product of the feature points in each feature group, and obtain the outer product of each feature group; the feature group includes feature points belonging to the first space domain and feature points belonging to the second space domain, and the The position of the feature point belonging to the first space domain in the first space domain is the same as the position of the feature point belonging to the second space domain in the second space domain; for each feature group The outer product of is summed to obtain the similarity between the third feature point and
  • the target feature is a feature obtained by feature extraction of pixels in the target image
  • the reference feature is a feature obtained by performing feature extraction on pixels in the reference image Extract the acquired features
  • the target feature is a feature obtained by extracting features from pixels in the target image and downsampling the extracted extraction at a preset downsampling rate;
  • the reference feature is to extract pixels in the reference image Features are extracted and the extracted features are obtained by downsampling the extracted features at the preset downsampling rate.
  • the offset acquisition unit is specifically configured to connect the similarity feature and the target feature in series in the channel dimension to acquire the offset prediction feature;
  • the offset prediction feature is input into the offset prediction convolutional layer; the output of the offset prediction convolutional layer is obtained as an offset between the target feature and the reference feature.
  • the target feature includes sub-target features of multiple spatial scales
  • the reference feature includes sub-reference features of multiple spatial scales
  • the offset includes multiple spatial scales the suboffset of the scale
  • the offset acquisition unit is specifically configured to predict convolution according to the sub-similarity features of the multiple spatial scales, the sub-target features of the multiple spatial scales, and the corresponding offsets of the multiple spatial scales layer, and obtain the sub-target features and sub-reference sub-offsets of the multiple spatial scales.
  • the processing unit is specifically configured to input the reference feature into the deformable convolution layer, and control the deformable convolution layer through the offset The shape of the convolution kernel;
  • the output of the deformable convolutional layer is obtained as an alignment result of the reference feature and the target feature.
  • the target feature includes sub-target features of multiple spatial scales
  • the reference feature includes sub-reference features of multiple spatial scales
  • the offset includes multiple spatial scales the suboffset of the scale
  • the processing unit is specifically configured to acquire the sub-target features and sub-reference features of the multiple spatial scales according to the sub-offsets of the multiple spatial scales and the corresponding deformable convolution layers of the multiple spatial scales alignment results; aligning the reference image with the target image according to the alignment results of the sub-target features and sub-reference features of the plurality of spatial scales.
  • the reference image is the nth image frame of the video to be repaired, and the reference image is the n+1th image frame of the video to be repaired; n is positive integer.
  • an embodiment of the present disclosure provides an electronic device, including: a memory and a processor, the memory is used to store a computer program; the processor is used to enable the electronic device to implement the first aspect when calling the computer program Or the image alignment method described in any optional implementation manner of the first aspect.
  • an embodiment of the present disclosure provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a computing device, the computing device realizes the first aspect Or the image alignment method described in any optional implementation manner of the first aspect.
  • an embodiment of the present disclosure provides a computer program product, which enables the computer to implement the first aspect or any optional implementation manner of the first aspect when the computer program product is run on a computer.
  • Image alignment method In a fifth aspect, an embodiment of the present disclosure provides a computer program product, which enables the computer to implement the first aspect or any optional implementation manner of the first aspect when the computer program product is run on a computer.
  • FIG. 1 is a flowchart of steps of an image alignment method provided by an embodiment of the present disclosure
  • FIG. 2 is one of the schematic diagrams of related feature points corresponding to the feature points provided by the embodiments of the present disclosure
  • FIG. 3 is the second schematic diagram of the related feature points corresponding to the feature points provided by the embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a first spatial domain corresponding to feature points provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of a second spatial domain corresponding to feature points provided by an embodiment of the present disclosure
  • FIG. 6 is one of the schematic flowcharts of the image alignment method provided by the embodiment of the present disclosure.
  • FIG. 7 is the second schematic flow diagram of the image alignment method provided by the embodiment of the present disclosure.
  • FIG. 8 is the third schematic flow diagram of the image alignment method provided by the embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a graphics processing device provided by an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present disclosure.
  • words such as “exemplary” or “for example” are used as examples, illustrations or illustrations. Any embodiment or design described as “exemplary” or “for example” in the embodiments of the present disclosure shall not be construed as being preferred or advantageous over other embodiments or designs. Rather, the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner.
  • the meaning of "plurality” refers to two or more.
  • a feature-based image alignment method is proposed in the related art, which specifically includes: obtaining the reference features of the reference image and the target features of the target image, and then predicting the offset according to the reference features and target features , and then align the reference features with the target features based on the deformable convolution layer through the offset control to obtain the final alignment result.
  • the above-mentioned feature-based image alignment method does not need to calculate the optical flow field, so it is more efficient and can directly align image features. There is no initial value when shifting, so when the image quality of the reference image and the target image are poor, the alignment result of the deformable convolutional layer will have a large difference from the true value of the alignment result.
  • the present disclosure provides an image alignment method and device, which are used to solve the lack of initial value when predicting the offset in the related art, which leads to a large difference between the alignment result of the deformable convolution layer and the true value of the alignment result The problem.
  • the general inventive concept of the embodiments of the present disclosure is: due to the fact that in the actual application process, the deformable convolutional layer has a relatively serious degradation process (fuzzy, foggy, noise) in the input image, the alignment of the deformable convolutional layer The result is very unstable and prone to gradient explosion phenomenon.
  • the source of the instability lies in the early stage of training. Due to the uncertainty of the initial value of the offset, the initial predicted offset is quite different from the actual true value. Therefore, in order to stabilize the training process, the embodiment of the present disclosure introduces the correlation layer in the optical flow network, and uses the correlation layer to obtain the similarity feature between the target feature and the reference feature as the guide of the offset. Since the similarity feature has a relatively close relationship with the optical flow, guided by the similarity feature, the problem of the large difference between the initial predicted offset and the actual true value can be improved.
  • An embodiment of the present disclosure provides an image alignment method, as shown in FIG. 1 , the image alignment method includes the following steps:
  • the target features include feature points corresponding to pixels in the target image;
  • the reference features include feature points corresponding to pixels in the reference image.
  • the target feature and the reference feature may be adjacent image frames in a video. That is, the reference image is the nth image frame of the video to be repaired, the reference image is the n+1th image frame of the video to be repaired, and n is a positive integer.
  • the similarity feature includes: the similarity between each feature point in the target feature and the corresponding related feature point, and the related feature point corresponding to the feature point in the target feature includes: the pixel coordinates in the reference feature The pixel coordinates of the feature points in the target feature are the same as and adjacent to the feature points.
  • the pixel coordinates of the feature points in the embodiments of the present disclosure refer to the pixel coordinates of the pixel points corresponding to the feature points in the image to which they belong.
  • the feature point F a11 corresponding to the pixel point I 11 whose pixel coordinate is (1,1) in the reference image is, then the pixel coordinate of the feature point F a11 is (1,1).
  • the feature point corresponding to the pixel point I 23 whose pixel coordinate is (2, 3) in the target image is F b23
  • the pixel coordinate of the feature point F b23 is (2, 3).
  • pixel coordinates and pixel coordinates in the embodiments of the present disclosure mean that two pixels belong to a preset coordinate range, and are limited to not including other pixel coordinates between the two pixel coordinates.
  • the relevant feature points corresponding to the first feature point in the target feature include: the second feature point and the neighborhood of the first preset value of the second feature point feature points;
  • the second feature point is a feature point whose pixel coordinates in the reference feature are the same as those of the first feature point.
  • the related feature points corresponding to the first feature point include: the second feature point whose pixel coordinates in the reference feature are the same as the pixel coordinates of the first feature point, and the second Feature points in the neighborhood of d*d of feature points.
  • d may be 9. That is, the relevant feature points corresponding to the first feature point include: the second feature point whose pixel coordinates in the reference feature are the same as the pixel coordinates of the first feature point, and the 9*9 neighborhood of the second feature point feature points
  • the resolution of the target image and the reference image are both 6*6, and the first preset value is 3 as an example, and the relevant feature points corresponding to the feature points in the target feature to explain.
  • the pixel coordinate of the feature point Fa33 in the target feature 21 is (3,3)
  • the feature point whose pixel coordinate is (3,3) in the reference feature 22 is Fb33
  • the feature points in the 3*3 neighborhood of the second feature point Fb33 include: Fb22, Fb23, Fb24, Fb32, Fb34, Fb42, Fb43, Fb44, so the features in the target feature 21
  • the relevant feature points corresponding to the point Fa33 include 9 feature points in the reference feature, and the 9 feature points are: Fb22, Fb23, Fb24, Fb32, Fb33, Fb34, Fb42, Fb43, Fb44.
  • the resolution of the target image and the reference image is still 6*6, and the first preset value is 3 as an example.
  • the pixel coordinates of the feature point Fa46 in the target feature 31 is (4,6)
  • the feature point whose pixel coordinates are (4,6) in the reference feature 32 is Fb46
  • the first feature point is When Fa46
  • the second feature point is Fb46
  • the feature points in the 3*3 neighborhood of the second feature point Fb46 include: Fb35, Fb36, Fb45, Fb55, Fb56, so the relevant feature corresponding to the feature point Fa46 in the target feature 21
  • the points include 6 feature points in the reference feature, and the 6 feature points are: Fb35, Fb36, Fb45, Fb46, Fb55, and Fb56.
  • each feature point (Fa11, Fa12, Fa13...Fa65, Fa66) in the target feature is used as the first feature point one by one, then the relevant feature point corresponding to each feature point in the target feature can be determined.
  • the second feature point whose pixel coordinates are the same as the pixel coordinates of the first feature point in the reference feature and the feature points within the neighborhood of the first preset value of the second feature point are determined as the first feature point corresponding to the relevant feature point, compared to only determining the second feature point as the relevant feature point corresponding to the first feature point, the above embodiment can increase the receptive field of the similarity feature when obtaining the similarity feature, and then refer to When the true value of the offset between the image and the target image is large, the obtained similarity feature is inaccurate.
  • step S102 the similarity feature is obtained according to the target feature and the reference feature, including the following steps a to d.
  • Step a Determine the first spatial domain corresponding to the third feature point in the target feature.
  • the first spatial domain is the spatial domain formed by the third feature point and the feature points in the neighborhood of the second preset value of the third feature point; the third feature point is the target feature any feature point in .
  • the second preset value is k
  • the first spatial domain is the spatial domain formed by the third feature point and the feature points within the k*k neighborhood of the third feature point.
  • the resolution of the target image and the reference image are both 6*6, and the second preset value is 3 as an example.
  • the pixels in the 3*3 neighborhood of the feature point Fa33 in the target feature 41 include: Fa22, Fa23, Fa24, Fa32, Fa34, Fa42, Fa43, Fa44, so the first feature point Fa33 corresponds
  • the space domain is the space domain 400 composed of Fa22, Fa23, Fa24, Fa32, Fa33, Fa34, Fa42, Fa43, and Fa44.
  • Step b Determine the second spatial domain corresponding to the fourth feature point in the reference feature.
  • the second spatial domain is the spatial domain formed by the fourth feature point and the feature points within the neighborhood of the second preset value of the fourth feature point; the fourth feature point is the third The relevant feature points corresponding to the feature.
  • the second spatial domain is the spatial domain formed by the fourth feature point and the feature points within the k*k neighborhood of the fourth feature point.
  • the resolution of the target image and the reference image are both 6*6, and the second preset value is 3 as an example, the second space corresponding to the feature points in the reference feature domain is described.
  • the second spatial domain corresponding to the feature point Fb22 is the space 500 composed of Fb11, Fb12, Fb13, Fb21, Fb22, Fb23, Fb31, Fb32, and Fb33.
  • Step c Calculate the outer product of the feature points in each feature group, and obtain the outer product of each feature group.
  • the feature group includes feature points belonging to the first spatial domain and feature points belonging to the second spatial domain, and the feature points belonging to the first spatial domain are in the first spatial domain
  • the position of is the same as the position of the feature point belonging to the second space domain in the second space domain.
  • the outer product in the embodiments of the present disclosure refers to the vector product of two eigenvectors.
  • the feature points with the same position in their respective spatial domains include: Fa22 and Fb11, Fa23 and Fb12, Fa24 and Fb13, Fa32 and Fb21, Fa33 and Fb22, Fa34 and Fb23, Fa42 and Fb31 , Fa43 and Fb32, Fa44 and Fb33, so the calculation feature group (Fa22, Fb11), feature group (Fa23, Fb12), feature group (Fa24, Fb13), (Fa32, Fb21), feature group (Fa33, Fb22), feature Outer product of group (Fa34, Fb23), feature group (Fa42, Fb31), feature group (Fa43, Fb32), feature group (Fa44, Fb33), get Fa22 ⁇ Fb11, Fa23 ⁇ Fb12, Fa24 ⁇ Fb13, Fa32 ⁇ Fb21 , Fa33 ⁇ Fb22, Fa34 ⁇ Fb23, Fa42 ⁇ Fb31, Fa43 ⁇ Fb32, Fa44 ⁇ Fb33, Fa22 ⁇ Fb11
  • Step d summing the outer products of each feature group to obtain the similarity between the third feature point and the fourth feature point.
  • the outer product of the feature group includes: Fa22 ⁇ Fb11, Fa23 ⁇ Fb12, Fa24 ⁇ Fb13, Fa32 ⁇ Fb21, Fa33 ⁇ Fb22, Fa34 ⁇ Fb23, Fa42 ⁇ Fb31, Fa43 ⁇ Fb32, Fa44 ⁇ Fb33, so
  • the similarity between the third feature point Fa33 and the fourth feature point Fb22 is: Fa22 ⁇ Fb11+Fa23 ⁇ Fb12+Fa24 ⁇ Fb13+Fa32 ⁇ Fb21+Fa33 ⁇ Fb22+Fa34 ⁇ Fb23+Fa42 ⁇ Fb31+Fa43 ⁇ Fb32+Fa44 ⁇ Fb33.
  • the similarity between x 1 and x 2 can be calculated by the following formula :
  • c(x 1 , x 2 ) is the similarity between feature points x 1 and x 2
  • k is a constant
  • f 1 (x 1 +o) represents the feature points in the k*k neighborhood of x 1 and x 1
  • f 2 (x 2 +o) represents x 2 and the feature points within the k*k neighborhood of x 2 .
  • the similarity between the third feature point and the corresponding related feature point can be obtained, and then each of the target features
  • the feature points (Fa11, Fa12, Fa13...Fa65, Fa66) are used as the third feature point, then the similarity between each feature point in the target feature and the corresponding related feature point is obtained, and then the similarity feature is obtained.
  • the above embodiment when calculating the similarity between the third feature point and the fourth feature point, first determine the first space domain corresponding to the third feature point and the second space domain corresponding to the fourth feature point, and then calculate the Outer product of the feature points, obtain the outer product of each feature group, and finally sum the outer products of each feature group, and use the summation result as the similarity of the third feature point and the fourth feature point.
  • the above embodiment can increase The dimension of obtaining similarity is large, and then the robustness of similarity feature acquisition is improved.
  • an implementation of the above step S103 includes the following steps 1 to 3:
  • Step 1 Connect the similarity feature and the target feature in series in the channel dimension to obtain the offset prediction feature.
  • the number of channels (channels) of a feature in the embodiments of the present disclosure refers to the number of feature maps (feature maps) contained in a feature, and a channel of a feature is a feature map obtained by extracting features based on a certain dimension. Therefore, the feature The channel of is a feature map in a specific sense. Concatenating the similarity feature and the target feature in the dimension of the channel to obtain the offset prediction feature is: after the feature map of the similarity feature is connected in series with the feature map of the target feature, so as to obtain all the features that include the target feature in sequence Offset prediction features for feature maps and all feature maps for similarity features.
  • Step 2 Input the offset prediction feature into the offset prediction convolutional layer.
  • Step 3 Obtain the output of the offset prediction convolutional layer as the offset between the target feature and the reference feature.
  • an implementation of the above step S104 includes the following steps I to III:
  • Step I Input the reference feature into the deformable convolution layer, and control the shape of the convolution kernel of the deformable convolution layer through the offset.
  • Step II obtaining the output of the deformable convolutional layer as an alignment result of the reference feature and the target feature.
  • Step III the alignment result of the reference feature and the target feature, aligning the reference image with the target image.
  • the reference feature is expressed as F1
  • the target feature is expressed as F2
  • the similarity feature is expressed as Fc
  • the offset prediction feature is expressed as Ft
  • the offset is expressed as Off
  • the alignment result is expressed as Fa
  • the similarity is obtained
  • the feature module is called a correlation layer
  • the module for obtaining offset prediction features is represented as a concatenation layer
  • the reference feature F1 and the target feature F2 are input into the correlation layer 61, and the output of the correlation layer 61 is obtained as the similarity feature Fc.
  • the similarity feature Fc and the target feature F2 are input into the concatenation layer 62, and the output of the concatenation layer 62 is obtained as the offset prediction feature Ft.
  • the feature dimension of the reference feature F1 and the target feature F2 is C ⁇ H ⁇ W
  • the feature dimension of the similarity feature Fc is (d*d) ⁇ H ⁇ W
  • the feature dimension of the offset prediction feature Ft is (d*d+C) ⁇ H ⁇ W; wherein, d is the first preset value.
  • the image alignment method provided by the embodiments of the present disclosure first obtains the target features including the feature points corresponding to the pixels in the target image and the reference features including the feature points corresponding to the pixels in the reference image, and then according to the target features and the The reference feature is used to obtain the feature including the similarity between each feature point in the target feature and the corresponding related feature point, and then predict the convolution layer according to the similarity feature, the target feature and the offset to obtain the target feature and the offset of the reference feature, and finally align the reference image with the target image according to the offset and the deformable convolutional layer.
  • the target feature is a feature obtained by feature extraction of pixels in the target image
  • the reference feature is a feature obtained by performing feature extraction on pixels in the reference image Extract the acquired features
  • the images extracted from the target image and the reference image are used as target features and reference features respectively.
  • the target feature is a feature obtained by extracting features from pixels in the target image and downsampling the extracted extraction at a preset downsampling rate
  • the The reference feature is a feature obtained by extracting features from pixels in the reference image and down-sampling the extracted extraction at the preset down-sampling rate.
  • the preset downsampling rate may be 1/16.
  • the feature extracted from the target image is down-sampled to 1/16 of the original feature as the target feature
  • the feature extracted from the reference image is down-sampled to 1/16 of the original feature as the reference feature.
  • the offset prediction convolutional layer when the first preset value is set larger, the offset prediction convolutional layer can obtain a sufficiently large receptive field, but at the same time it will also increase the amount of calculation for calculating similarity features, and then Affects image alignment efficiency.
  • the above embodiment downsamples the features extracted from the target image and the features extracted from the target image at a preset downsampling rate, so that the similarity can be improved while ensuring a sufficiently large receptive field.
  • the calculation of degree features is too large, which in turn affects the efficiency of image alignment.
  • the image alignment method provided by the embodiment of the present disclosure further includes: performing a downsampling operation on the target feature F2 and the reference feature F1 (as shown in FIG. 7 to Shown by a down arrow), and an upsampling operation is performed on the similarity features (shown by an up arrow in FIG. 7 ).
  • a cascaded pyramid architecture can also be used to progressively align target features and reference features from multiple different spatial scales.
  • the target feature includes sub-target features of multiple spatial scales
  • the reference feature includes sub-reference features of multiple spatial scales
  • the similarity feature includes sub-similarity features of multiple spatial scales
  • the sub-target features of the multiple spatial scales predict the convolutional layer, and obtain the sub-targets of the multiple spatial scales
  • the suboffset of the target feature and subreference features are the sub-target features of the multiple spatial scales.
  • the above embodiment acquires the offset between the feature of the reference image and the feature of the target image progressively from multiple spatial scales, the above embodiment can more accurately acquire the feature of the reference image and the target image.
  • the offset between features of the image Since the above embodiment acquires the offset between the feature of the reference image and the feature of the target image progressively from multiple spatial scales, the above embodiment can more accurately acquire the feature of the reference image and the target image. The offset between features of the image.
  • the target feature includes sub-target features of multiple spatial scales
  • the reference feature includes sub-reference features of multiple spatial scales
  • the offset includes sub-offsets of multiple spatial scales
  • the above embodiment can improve the accuracy of the alignment result of the reference image and the target image.
  • aligning the reference image with the target image according to the offset and the deformable convolution layer includes:
  • the n-1th level spatial scale is smaller than the nth level spatial scale.
  • a 3-level cascaded pyramid architecture is used as an example to perform progressive alignment of target features and reference features from different spatial scales.
  • sub-reference features F1_1, F1_2, F1_3, and sub-target features F2_1, F2_2, and F2_3 are obtained from the spatial scales corresponding to the first level, the second level, and the third level.
  • the sub-offset Off_3 of the sub-reference feature F1_1 and the sub-target feature F2_1 and its result Fa_3 are obtained through the image alignment method provided in the above embodiment, because the third-level spatial scale does not include the above The first-level spatial scale, so the sub-offset Off_3 directly acts on the deformable convolutional layer corresponding to the third-level spatial scale, and the alignment result of the third-level spatial scale is the same as its target result.
  • the image alignment method provided by the above embodiment, obtain the sub-offset Off_2 of the sub-reference feature F1_2 and sub-target feature F2_2 of the second-level spatial scale, and generate the second level according to the sub-offset Off_3 and sub-offset Off_2
  • the target offset of the spatial scale and then input the target offset of the second-level spatial scale into the deformable convolution layer of the second-level spatial scale to obtain the second-level alignment result Fa_2, and combine the alignment result Fa_2 with the third-level
  • the object of the spatial scale is related to the result Fa_3, and the result of the object of the second level of spatial scale is obtained.
  • the image alignment method provided in the above embodiment, obtain the sub-offset Off_1 of the sub-reference feature F1_1 and sub-target feature F2_1 of the first-level spatial scale, and generate the first-level according to the sub-offset Off_2 and sub-offset Off_1
  • the target offset of the spatial scale and then input the target offset of the first-level spatial scale into the deformable convolution layer of the first-level spatial scale to obtain the first-level alignment result Fa_1, and combine the alignment result Fa_1 with the second-level
  • the target of the spatial scale is connected to the knot, and the target of the first level of spatial scale is obtained to its result (the final result of the relationship).
  • the embodiment of the present disclosure also provides an image alignment device, the device embodiment corresponds to the foregoing method embodiment, for the sake of easy reading, the present device embodiment does not implement the foregoing method.
  • the image alignment device in this embodiment can correspondingly implement all the content in the foregoing method embodiments.
  • FIG. 9 is a schematic structural diagram of the image alignment device. As shown in FIG. 9 , the image alignment device 900 includes:
  • a feature acquisition unit 91 configured to acquire target features and reference features; the target features include feature points corresponding to pixels in the target image, and the reference features include feature points corresponding to pixels in the reference image;
  • a similarity acquiring unit 92 configured to acquire a similarity feature according to the target feature and the reference feature; the similarity feature includes: the similarity between each feature point in the target feature and the corresponding related feature point,
  • the relevant feature points corresponding to the feature points in the target feature include: feature points whose pixel coordinates in the reference feature are the same as or adjacent to the pixel coordinates of the feature points in the target feature;
  • An offset acquisition unit 93 configured to predict a convolutional layer according to the similarity feature, the target feature, and an offset, and acquire an offset between the target feature and the reference feature;
  • the processing unit 94 is configured to align the reference image with the target image according to the offset and the deformable convolution layer.
  • the relevant feature points corresponding to the first feature point in the target feature include: the second feature point and the neighborhood of the first preset value of the second feature point feature points;
  • the second feature point is a feature point whose pixel coordinates in the reference feature are the same as those of the first feature point.
  • the similarity acquiring unit 92 is specifically configured to determine the first spatial domain corresponding to the third feature point in the target feature, and the first spatial domain is the The space domain formed by the feature points within the neighborhood of the third feature point and the second preset value of the third feature point; determining the second space domain corresponding to the fourth feature point in the reference feature, the first The second spatial domain is the spatial domain formed by the fourth feature point and the feature points within the neighborhood of the second preset value of the fourth feature point; the fourth feature point is a related feature corresponding to the third feature points; calculate the outer product of the feature points in each feature group, and obtain the outer product of each feature group; the feature group includes feature points belonging to the first space domain and feature points belonging to the second space domain, and The position of the feature point belonging to the first space domain in the first space domain is the same as the position of the feature point belonging to the second space domain in the second space domain; for each feature Summing the outer products of the groups to obtain the similarity between the third feature point and the
  • the target feature is a feature obtained by feature extraction of pixels in the target image
  • the reference feature is a feature obtained by performing feature extraction on pixels in the reference image Extract the acquired features
  • the target feature is a feature obtained by extracting features from pixels in the target image and downsampling the extracted extraction at a preset downsampling rate;
  • the reference feature is to extract pixels in the reference image Features are extracted and the extracted features are obtained by downsampling the extracted features at the preset downsampling rate.
  • the offset acquisition unit 93 is specifically configured to connect the similarity feature and the target feature in series in the channel dimension, and acquire the offset prediction feature; Inputting the offset prediction feature into the offset prediction convolutional layer; obtaining an output of the offset prediction convolutional layer as an offset between the target feature and the reference feature.
  • the target feature includes sub-target features of multiple spatial scales
  • the reference feature includes sub-reference features of multiple spatial scales
  • the similarity feature includes multiple spatial scales of sub-reference features.
  • the offset acquisition unit 93 is specifically configured to predict volume according to the sub-similarity features of the multiple spatial scales, the sub-target features of the multiple spatial scales, and the corresponding offsets of the multiple spatial scales. layering, to obtain the sub-offsets of the sub-target features and sub-reference features of the multiple spatial scales.
  • the processing unit 94 is specifically configured to input the reference feature into the deformable convolution layer, and control the deformable convolution layer through the offset The shape of the convolution kernel of the layer;
  • the output of the deformable convolutional layer is obtained as an alignment result of the reference feature and the target feature.
  • the target feature includes sub-target features of multiple spatial scales
  • the reference feature includes sub-reference features of multiple spatial scales
  • the offset includes multiple spatial scales the suboffset of the scale
  • the processing unit 94 is specifically configured to obtain the sub-target features and sub-references of the multiple spatial scales according to the sub-offsets of the multiple spatial scales and the corresponding deformable convolution layers of the multiple spatial scales Alignment results of features: aligning the reference image with the target image according to the alignment results of the sub-target features and sub-reference features of the multiple spatial scales.
  • the reference image is the nth image frame of the video to be repaired, and the reference image is the n+1th image frame of the video to be repaired; n is positive integer.
  • the image alignment device provided in this embodiment can execute the image alignment method provided in the above method embodiment, and its implementation principle and technical effect are similar, and will not be repeated here.
  • FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device provided by this embodiment includes: a memory 101 and a processor 102, the memory 101 is used to store computer programs; the processing The device 102 is configured to execute the image alignment method provided by the above-mentioned embodiments when calling a computer program.
  • An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computing device implements the image alignment method provided by the above-mentioned embodiments.
  • An embodiment of the present disclosure further provides a computer program product, which enables the computer to implement the image alignment method provided in the above embodiment when the computer program product is run on a computer.
  • the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Accordingly, the present disclosure can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein.
  • the processor can be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • Memory may include non-permanent storage in computer readable media, in the form of random access memory (RAM) and/or nonvolatile memory such as read only memory (ROM) or flash RAM.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash random access memory
  • Computer-readable media includes both volatile and non-volatile, removable and non-removable storage media.
  • the storage medium may store information by any method or technology, and the information may be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, A magnetic tape cartridge, disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer readable media excludes transitory computer readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

一种图像对齐方法及装置,涉及图像处理技术领域。该方法包括:获取包括目标图像中的像素点对应的特征点的目标特征和包括参考图像中的像素点对应的特征点的参考特征;根据目标特征和参考特征,获取相似度特征(S102);相似度特征包括:目标特征中的各个特征点与对应的相关特征点的相似度;根据相似度特征、目标特征以及偏移量预测卷积层,获取目标特征和参考特征的偏移量(S103);根据偏移量和可形变卷积层,将参考特征与目标特征对齐(S104)。上述方法用于解决相关技术中预测偏移量时缺乏初值,进而导致可形变卷积层的对齐结果会与对齐结果真值相差较大的问题。

Description

一种图像对齐方法及装置
相关申请的交叉引用
本申请是以申请号为202110557632.8,申请日为2021年5月21日的中国申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及图像处理技术领域,尤其涉及一种图像对齐方法及装置。
背景技术
图像对齐是指:确定参考图像和目标图像之间的变化参数,并根据变化参数将目标图像变形为与参考图像同样的空间布局的过程。图像对齐广泛应用于视频修复、图像融合、图像拼接、目标识别等领域。例如:在视频修复时,通过将相邻的图像帧对齐,可以有效利用相邻图像帧之间的信息获取图像帧的更多细节信息,从而得到清晰且细节更加丰富的视频。
传统的图像对齐方法为:计算目标图像和参考图像之间的光流(Optical Flow)场,并将光流场作为目标图像与参考图像的稠密配准关系,最后通过反向网格形变(Back-warping)的方式将目标图像对齐到参考图像。
发明内容
本公开实施例提供技术方案如下:
第一方面,本公开的实施例提供了一种图像对齐方法,包括:
获取目标特征和参考特征;所述目标特征包括目标图像中的像素点对应的特征点,所述参考特征包括参考图像中的像素点对应的特征点;
根据所述目标特征和所述参考特征,获取相似度特征;所述相似 度特征包括:所述目标特征中的各个特征点与对应的相关特征点的相似度,所述目标特征中的特征点对应的相关特征点包括:所述参考特征中像素坐标与所述目标特征中的特征点的像素坐标相同和相邻的特征点;
根据所述相似度特征、所述目标特征以及偏移量预测卷积层,获取所述目标特征和所述参考特征的偏移量;
根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐。
作为本公开实施例一种可选的实施方式,所述目标特征中的第一特征点对应的相关特征点包括:第二特征点以及所述第二特征点的第一预设值的邻域内的特征点;
其中,所述第二特征点为所述参考特征中像素坐标与所述第一特征点的像素坐标相同的特征点。
作为本公开实施例一种可选的实施方式,所述根据所述目标特征和所述参考特征,获取相似度特征,包括:
确定所述目标特征中的第三特征点对应的第一空间域,所述第一空间域为所述第三特征点和所述第三特征点的第二预设值的邻域内的特征点形成的空间域;
确定所述参考特征中的第四特征点对应的第二空间域,所述第二空间域为所述第四特征点和所述第四特征点的第二预设值的邻域内的特征点形成的空间域;所述第四特征点为所述第三特征对应的相关特征点;
计算各个特征组中的特征点的外积,获取各个特征组的外积;所述特征组包括属于所述第一空间域的特征点以及属于所述第二空间域的特征点,且所述属于所述第一空间域的特征点在所述第一空间域中的位置与所述属于所述第二空间域的特征点在所述第二空间域中的位置相同;
对各个特征组的外积求和,获取所述第三特征点与所述第四特征 点的相似度。
作为本公开实施例一种可选的实施方式,所述目标特征为对所述目标图像中的像素点进行特征提取获取的特征,所述参考特征为对所述参考图像中的像素点进行特征提取获取的特征;
或者;
所述目标特征为对所述目标图像中的像素点进行特征提取并以预设降采样率对提取的提取进行降采样得到的特征;所述参考特征为对所述参考图像中的像素点进行特征提取并以所述预设降采样率对提取的提取进行降采样得到的特征。
作为本公开实施例一种可选的实施方式,所述根据所述相似度特征、所述目标特征以及偏移量预测卷积层,获取所述目标特征和所述参考特征的偏移量,包括:
将所述相似度特征与所述目标特征在通道的维度上串联,获取偏移量预测特征;
将所述偏移量预测特征输入所述偏移量预测卷积层;
获取所述偏移量预测卷积层的输出作为所述目标特征和所述参考特征的偏移量。
作为本公开实施例一种可选的实施方式,所述目标特征包括多个空间尺度的子目标特征,所述参考特征包括多个空间尺度的子参考特征,所述相似度特征多个空间尺度的子相似度特征;
所述据所述相似度特征、所述目标特征以及偏移量预测卷积层,获取所述目标特征和所述参考特征的偏移量,包括:
根据所述多个空间尺度的子相似度特征、所述多个空间尺度的子目标特征以及所述多个空间尺度的对应的偏移量预测卷积层,获取所述多个空间尺度的子目标特征和子参考特征的子偏移量。
作为本公开实施例一种可选的实施方式,所述根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐,包括:
将所述参考特征输入所述可形变卷积层,并通过所述偏移量控制 所述可形变卷积层的卷积核的形状;
获取所述可形变卷积层的输出作为所述参考特征与所述目标特征的对齐结果;
根据所述参考特征与所述目标特征的对齐结果,将所述参考图像与所述目标图像对齐。
作为本公开实施例一种可选的实施方式,所述目标特征包括多个空间尺度的子目标特征,所述参考特征包括多个空间尺度的子参考特征,所述偏移量包括多个空间尺度的子偏移量;
所述根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐,包括:
根据所述多个空间尺度的子偏移量和所述多个空间尺度的对应的可形变卷积层,获取所述多个空间尺度的子目标特征和子参考特征的对齐结果;
根据所述多个空间尺度的子目标特征和子参考特征的对齐结果,将所述参考图像与所述目标图像对齐。
作为本公开实施例一种可选的实施方式,所述参考图像为待修复视频的第n个图像帧,所述参考图像为所述待修复视频的第n+1个图像帧;n为正整数。
第二方面,本公开实施例提供一种图像对齐装置,包括:
特征获取单元,用于获取目标特征和参考特征;所述目标特征包括目标图像中的像素点对应的特征点,所述参考特征包括参考图像中的像素点对应的特征点;
相似度获取单元,用于根据所述目标特征和所述参考特征,获取相似度特征;所述相似度特征包括:所述目标特征中的各个特征点与对应的相关特征点的相似度,所述目标特征中的特征点对应的相关特征点包括:所述参考特征中像素坐标与所述目标特征中的特征点的像素坐标相同和相邻的特征点;
偏移量获取单元,用于根据所述相似度特征、所述目标特征以及 偏移量预测卷积层,获取所述目标特征和所述参考特征的偏移量;
处理单元,用于根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐。
作为本公开实施例一种可选的实施方式,所述目标特征中的第一特征点对应的相关特征点包括:第二特征点以及所述第二特征点的第一预设值的邻域内的特征点;
其中,所述第二特征点为所述参考特征中像素坐标与所述第一特征点的像素坐标相同的特征点。
作为本公开实施例一种可选的实施方式,所述相似度获取单元,具体用于确定所述目标特征中的第三特征点对应的第一空间域,所述第一空间域为所述第三特征点和所述第三特征点的第二预设值的邻域内的特征点形成的空间域;确定所述参考特征中的第四特征点对应的第二空间域,所述第二空间域为所述第四特征点和所述第四特征点的第二预设值的邻域内的特征点形成的空间域;所述第四特征点为所述第三特征对应的相关特征点;计算各个特征组中的特征点的外积,获取各个特征组的外积;所述特征组包括属于所述第一空间域的特征点以及属于所述第二空间域的特征点,且所述属于所述第一空间域的特征点在所述第一空间域中的位置与所述属于所述第二空间域的特征点在所述第二空间域中的位置相同;对各个特征组的外积求和,获取所述第三特征点与所述第四特征点的相似度。
作为本公开实施例一种可选的实施方式,所述目标特征为对所述目标图像中的像素点进行特征提取获取的特征,所述参考特征为对所述参考图像中的像素点进行特征提取获取的特征;
或者;
所述目标特征为对所述目标图像中的像素点进行特征提取并以预设降采样率对提取的提取进行降采样得到的特征;所述参考特征为对所述参考图像中的像素点进行特征提取并以所述预设降采样率对提取的提取进行降采样得到的特征。
作为本公开实施例一种可选的实施方式,所述偏移量获取单元,具体用于将所述相似度特征与所述目标特征在通道的维度上串联,获取偏移量预测特征;将所述偏移量预测特征输入所述偏移量预测卷积层;获取所述偏移量预测卷积层的输出作为所述目标特征和所述参考特征的偏移量。
作为本公开实施例一种可选的实施方式,所述目标特征包括多个空间尺度的子目标特征,所述参考特征包括多个空间尺度的子参考特征,所述偏移量包括多个空间尺度的子偏移量;
所述偏移量获取单元,具体用于根据所述多个空间尺度的子相似度特征、所述多个空间尺度的子目标特征以及所述多个空间尺度的对应的偏移量预测卷积层,获取所述多个空间尺度的子目标特征和子参考特征的子偏移量。
作为本公开实施例一种可选的实施方式,所述处理单元,具体用于将所述参考特征输入所述可形变卷积层,并通过所述偏移量控制所述可形变卷积层的卷积核的形状;
获取所述可形变卷积层的输出作为所述参考特征与所述目标特征的对齐结果。
作为本公开实施例一种可选的实施方式,所述目标特征包括多个空间尺度的子目标特征,所述参考特征包括多个空间尺度的子参考特征,所述偏移量包括多个空间尺度的子偏移量;
所述处理单元,具体用于根据所述多个空间尺度的子偏移量和所述多个空间尺度的对应的可形变卷积层,获取所述多个空间尺度的子目标特征和子参考特征的对齐结果;根据所述多个空间尺度的子目标特征和子参考特征的对齐结果,将所述参考图像与所述目标图像对齐。
作为本公开实施例一种可选的实施方式,所述参考图像为待修复视频的第n个图像帧,所述参考图像为所述待修复视频的第n+1个图像帧;n为正整数。
第三方面,本公开实施例提供一种电子设备,包括:存储器和处 理器,所述存储器用于存储计算机程序;所述处理器用于在调用计算机程序时,使得所述电子设备实现第一方面或第一方面任一种可选的实施方式所述的图像对齐方法。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,当所述计算机程序被计算设备执行时,使得所述计算设备实现第一方面或第一方面任一种可选的实施方式所述的图像对齐方法。
第五方面,本公开实施例提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机实现第一方面或第一方面任一种可选的实施方式所述的图像对齐方法。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的图像对齐方法的步骤流程图;
图2为本公开实施例提供的特征点对应的相关特征点的示意图之一;
图3为本公开实施例提供的特征点对应的相关特征点的示意图之二;
图4为本公开实施例提供的特征点对应的第一空间域的示意图;
图5为本公开实施例提供的特征点对应的第二空间域的示意图;
图6为本公开实施例提供的图像对齐方法的流程示意图之一;
图7为本公开实施例提供的图像对齐方法的流程示意图之二;
图8为本公开实施例提供的图像对齐方法的流程示意图之三;
图9为本公开实施例提供的图形处理装置的示意图;
图10为本公开实施例提供的电子设备的硬件结构示意图。
具体实施方式
为了能够更清楚地理解本公开的上述特征和优点,下面将对本公开的方案进行进一步描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,而不是全部的实施例。
在本公开实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本公开实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。此外,在本公开实施例的描述中,除非另有说明,“多个”的含义是指两个或两个以上。
传统的基于光流的图像对齐方式虽然也可以实现图像对齐,但计算光流场时的计算量非常大,因此效率较低。为了改善图像对齐效率低的问题,相关技术中提出了一种基于特征的图像对齐方式,具体包括:获取参考图像的参考特征和目标图像的目标特征,然后根据参考特征和目标特征预测偏移量,再通过偏移量控制基于可形变卷积层将参考特征与目标特征对齐,获取最终的对齐结果。相比于基于光流的图像对齐方式,上述基于特征的图像对齐方式无需计算光流场,因此更为高效,且可以实现直接对图像特征的进行对齐,然而由于根据参考特征和目标特征预测偏移量时缺乏初值,因此在参考图像和目标图像的图像质量较差时,可形变卷积层的对齐结果会与对齐结果真值相差较大。
有鉴于此,本公开提供了一种图像对齐方法及装置,用于解决相 关技术中预测偏移量时缺乏初值,进而导致可形变卷积层的对齐结果会与对齐结果真值相差较大的问题。
本公开实施例的总的发明构思为:由于在实际应用过程中,可形变卷积层在输入图像存在较为严重的降质过程(模糊、有雾、噪声)时,可形变卷积层的对齐结果非常不稳定,很容易出现梯度爆炸现象。而不稳定的根源在于训练初期,由于偏移量初始值的不确定性,使得初期预测的偏移量与实际的真值相差较大。因此,为了稳定这个训练过程,本公开实施例中将光流网络中的相关性层引入,并将相关性层得到目标特征和参考特征的相似度特征作为偏移量的引导。由于相似度特征与光流的具有较为密切的关系,通过相似度特征引导,可以改善初期预测的偏移量与实际的真值相差较大的问题。
本公开实施例提供了一种图像对齐方法,参照图1所示,该图像对齐方法包括如下步骤:
S101、获取目标特征和参考特征。
其中,所述目标特征包括目标图像中的像素点对应的特征点;所述参考特征包括参考图像中的像素点对应的特征点。
可选的,所述目标特征和所述参考特征可以为视频中的相邻图像帧。即,所述参考图像为待修复视频的第n个图像帧,所述参考图像为所述待修复视频的第n+1个图像帧,n为正整数。
S102、根据所述目标特征和所述参考特征,获取相似度特征。
其中,所述相似度特征包括:所述目标特征中的各个特征点与对应的相关特征点的相似度,所述目标特征中的特征点对应的相关特征点包括:所述参考特征中像素坐标与所述目标特征中的特征点的像素坐标相同和相邻的特征点。
具体的,本公开实施例中特征点的像素坐标是指特征点对应的像素点在所属图像中的像素坐标。
例如:参考图像中像素坐标为(1,1)的像素点I 11对应的特征点为F a11,则特征点F a11的像素坐标为(1,1)。
再例如:目标图像中像素坐标为(2,3)的像素点I 23对应的特征点为F b23,则特征点F b23的像素坐标为(2,3)。
需要说明的是,本公开实施例中的像素坐标与像素坐标是指两个像素同属于一个预设定的坐标范围,并限定于两个像素坐标之间不包含其它像素坐标。
作为本公开实施例一种可选的实施方式,所述目标特征中的第一特征点对应的相关特征点包括:第二特征点以及所述第二特征点的第一预设值的邻域内的特征点;
其中,所述第二特征点为所述参考特征中像素坐标与所述第一特征点的像素坐标相同的特征点。
设:第一预设值为d,则第一特征点对应的相关特征点,包括:所述参考特征中像素坐标与所述第一特征点的像素坐标相同的第二特征点,以及第二特征点的d*d的邻域内的特征点。
示例性的,d可以为9。即,第一特征点对应的相关特征点,包括:所述参考特征中像素坐标与所述第一特征点的像素坐标相同的第二特征点,以及第二特征点的9*9的邻域内的特征点
示例性的,参照图2所示,图2中以目标图像和参考图像的分辨率均为6*6、第一预设值为3为例,对目标特征中的特征点对应的相关特征点的进行说明。如图2所示,目标特征21中的特征点Fa33的像素坐标为(3,3),参考特征22中像素坐标为(3,3)的特征点为Fb33,因此,当第一特征点为Fa33时,第二特征点为Fb33,第二特征点Fb33的3*3的邻域内的特征点包括:Fb22、Fb23、Fb24、Fb32、Fb34、Fb42、Fb43、Fb44,因此目标特征21中的特征点Fa33对应的相关特征点包括参考特征中的9个特征点,该9个特征点分别为:Fb22、Fb23、Fb24、Fb32、Fb33、Fb34、Fb42、Fb43、Fb44。
示例性的,参照图3所示,图3中仍以目标图像和参考图像的分 辨率均为6*6、第一预设值为3为例,对目标特征中的特征点对应的相关特征点的进行说明。如图3所示,目标特征31中的特征点Fa46的像素坐标为(4,6),参考特征32中像素坐标为(4,6)的特征点为Fb46,因此,当第一特征点为Fa46时,第二特征点为Fb46,第二特征点Fb46的3*3的邻域内的特征点包括:Fb35、Fb36、Fb45、Fb55、Fb56,因此目标特征21中的特征点Fa46对应的相关特征点包括参考特征中的6个特征点,该6个特征点分别为:Fb35、Fb36、Fb45、Fb46、Fb55、Fb56。
根据上述相同方式,逐一将目标特征中的每一个特征点(Fa11、Fa12、Fa13……Fa65、Fa66)作为第一特征点,则可以确定目标特征中的每一个特征点对应的相关特征点。
上述实施例中将参考特征中像素坐标与所述第一特征点的像素坐标相同的第二特征点以及所述第二特征点的第一预设值的邻域内的特征点确定为第一特征点对应的相关特征点,相比于仅将第二特征点确定为第一特征点对应的相关特征点,上述实施例可以在获取相似度特征时,增大相似度特征的感受野,进而参考图像与目标图像的偏移量真值较大时,获取的相似度特征不准确。
进一步的,上步骤S102中根据所述目标特征和所述参考特征,获取相似度特征,包括如下步骤a至步骤d。
步骤a、确定所述目标特征中的第三特征点对应的第一空间域。
其中,所述第一空间域为所述第三特征点和所述第三特征点的第二预设值的邻域内的特征点形成的空间域;所述第三特征点为所述目标特征中的任一特征点。
设:第二预设值为k,则第一空间域为第三特征点和所述第三特征点的k*k的邻域内的特征点形成的空间域。
示例性的,参照图4所示,图4中以目标图像和参考图像的分辨率均为6*6、第二预设值为3为例,对目标特征中的特征点对应的第一空间域进行说明。如图4所示,目标特征41中的特征点Fa33的3*3 的邻域内的像素点包括:Fa22、Fa23、Fa24、Fa32、Fa34、Fa42、Fa43、Fa44,因此特征点Fa33对应的第一空间域为Fa22、Fa23、Fa24、Fa32、Fa33、Fa34、Fa42、Fa43、Fa44组成的空间域400。
步骤b、确定所述参考特征中的第四特征点对应的第二空间域。
其中,所述第二空间域为所述第四特征点和所述第四特征点的第二预设值的邻域内的特征点形成的空间域;所述第四特征点为所述第三特征对应的相关特征点。
同样,设:第二预设值为k,则第二空间域为第四特征点和所述第四特征点的k*k的邻域内的特征点形成的空间域。
示例性的,参照图5所示,图5中以目标图像和参考图像的分辨率均为6*6、第二预设值为3为例,对参考特征中的特征点对应的第二空间域进行说明。如图5所示,参考特征52中的图4所示的目标特征41中的特征点Fa33对应的相关特征点Fa22的3*3的邻域内的像素点包括:Fb11、Fb12、Fb13、Fb21、Fb23、Fb31、Fb32、Fb33,因此特征点Fb22对应的第二空间域为Fb11、Fb12、Fb13、Fb21、Fb22、Fb23、Fb31、Fb32、Fb33组成的空间500。
步骤c、计算各个特征组中的特征点的外积,获取各个特征组的外积。
其中,所述特征组包括属于所述第一空间域的特征点以及属于所述第二空间域的特征点,且所述属于所述第一空间域的特征点在所述第一空间域中的位置与所述属于所述第二空间域的特征点在所述第二空间域中的位置相同。
本公开实施例中的外积是指两个特征向量的向量积。例如:特征
Figure PCTCN2022093799-appb-000001
与特征
Figure PCTCN2022093799-appb-000002
的外积可以表示为:
Figure PCTCN2022093799-appb-000003
承上图4、图5所示,在各自所属空间域中位置相同的特征点包括:Fa22与Fb11、Fa23与Fb12、Fa24与Fb13、Fa32与Fb21、Fa33与Fb22、Fa34与Fb23、Fa42与Fb31、Fa43与Fb32、Fa44与Fb33,因此计算特征组(Fa22,Fb11)、特征组(Fa23,Fb12)、特征组(Fa24,Fb13)、 (Fa32,Fb21)、特征组(Fa33,Fb22)、特征组(Fa34,Fb23)、特征组(Fa42,Fb31)、特征组(Fa43,Fb32)、特征组(Fa44,Fb33)的外积,得到Fa22×Fb11、Fa23×Fb12、Fa24×Fb13、Fa32×Fb21、Fa33×Fb22、Fa34×Fb23、Fa42×Fb31、Fa43×Fb32、Fa44×Fb33。
步骤d、对各个特征组的外积求和,获取所述第三特征点与所述第四特征点的相似度。
承上示例所述,特征组的外积包括:Fa22×Fb11、Fa23×Fb12、Fa24×Fb13、Fa32×Fb21、Fa33×Fb22、Fa34×Fb23、Fa42×Fb31、Fa43×Fb32、Fa44×Fb33,因此所述第三特征点Fa33与所述第四特征点Fb22的相似度为:Fa22×Fb11+Fa23×Fb12+Fa24×Fb13+Fa32×Fb21+Fa33×Fb22+Fa34×Fb23+Fa42×Fb31+Fa43×Fb32+Fa44×Fb33。
即,对于目标特征中的特征点x 1和参考特征中与特征点x 1的相关的特征点x 2,x 1与x 2的相似度可以通过如下公式计算获取:
c(x 1,x 2)=∑[f 1(x 1+o)×f 2(x 2+o)];
o∈[-1/2k,1/2k]×[-1/2k,1/2k]。
其中,c(x 1,x 2)为特征点x 1和x 2的相似度,k为常数,f 1(x 1+o)表示以x 1以及x 1的k*k邻域内的特征点,f 2(x 2+o)表示以x 2以及x 2的k*k邻域内的特征点。
基于上述相同方法,逐一将第三特征点的对应的其它相关特征点作为第四特征点,则可以获取第三特征点与对应的相关特征点的相似度,再逐一将目标特征中的每一个特征点(Fa11、Fa12、Fa13……Fa65、Fa66)作为第三特征点,则获取目标特征中各个特征点与对应的相关特征点的相似度,进而获取所述相似度特征。
上述实施例在计算第三特征点与第四特征点的相似度时,首先确定第三特征点对应的第一空间域和第四特征点对应的第二空间域,然后计算各个特征组中的特征点的外积,获取各个特征组的外积,最后对各个特征组的外积求和,并将求和结果作为第三特征点第四特征点 的相似度。相比于直接计算第三特征点第四特征点的外积,并将第三特征点与第四特征点的外积作为第三特征点与第四特征点的相似度,上述实施例可以增大获取相似度的维度,进而提升相似度特征获取时的鲁棒性。
S103、根据所述相似度特征、所述目标特征以及偏移量预测卷积层,获取所述目标特征和所述参考特征的偏移量。
可选的,上述步骤S103的一种实现方式包括如下步骤1至步骤3:
步骤1、将所述相似度特征与所述目标特征在通道的维度上串联,获取偏移量预测特征。
本公开实施例中特征的通道(channel)数是指特征所包含的特征图(feature map)的数量,特征的一个通道即为基于某一维度对特征进行特征提取所得到的特征图,因此特征的通道即为特定意义上的特征图。将相似度特征与所述目标特征在通道的维度上串联,获取偏移量预测特征即为:将相似度特征的特征图依次串联与目标特征的特征图之后,从而获得依次包括目标特征的所有特征图以及相似度特征的所有特征图的偏移量预测特征。
步骤2、将所述偏移量预测特征输入所述偏移量预测卷积层。
步骤3、获取所述偏移量预测卷积层的输出作为所述目标特征和所述参考特征的偏移量。
S104、根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐。
可选的,上述步骤S104的一种实现方式包括如下步骤Ⅰ至步骤Ⅲ:
步骤Ⅰ、将所述参考特征输入所述可形变卷积层,并通过所述偏移量控制所述可形变卷积层的卷积核的形状。
步骤Ⅱ、获取所述可形变卷积层的输出作为所述参考特征与所述目标特征的对齐结果。
步骤Ⅲ、所述参考特征与所述目标特征的对齐结果,将所述参考图像与所述目标图像对齐。
综上所述,将参考特征表示为F1、目标特征表示为F2、相似度特征表示为Fc、偏移量预测特征表示为Ft、偏移量表示为Off、对齐结果表示为Fa、获取相似度特征的模块称为相关性层、获取偏移量预测特征的模块表示为串接层,则上述实施例提供的图像对齐方法的流程如6所示:
首先,将参考特征F1和目标特征F2输入相关性层61,并获取相关性层61的输出作为相似度特征Fc。
其次,将相似度特征Fc和目标特征F2输入串接层62,并获取串接层62的输出作为偏移量预测特征Ft。
再次,将偏移量预测特征Ft输入偏移量预测卷积层63,并获取偏移量预测卷积层63的输出作为偏移量Off。
最后,将偏移量Off和目标特征F2输入可形变卷积层64,并获取可形变卷积层64的输出作为对齐结果Fa。
进一步的,设:参考特征F1和目标特征F2的特征维度为C×H×W,则相似度特征Fc的特征维度为(d*d)×H×W,偏移量预测特征Ft的特征维度为(d*d+C)×H×W;其中,d为第一预设值。
本公开实施例提供的图像对齐方法首先获取包括目标图像中的像素点对应的特征点的目标特征和包括参考图像中的像素点对应的特征点的参考特征,然后根据所述目标特征和所述参考特征获取包括所述目标特征中的各个特征点与对应的相关特征点的相似度的特征,再根据所述相似度特征、所述目标特征以及偏移量预测卷积层,获取所述目标特征和所述参考特征的偏移量,最后根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐。由于目标特征中的各个特征点与对应的相关特征点的相似度与目标图像和参考图像之间的光流场具有很强的相关性,因此将相似度特征作为偏移量的引导,可以更加准确的预测所述目标特征和所述参考特征的偏移量,进而解决可形变卷积层的对齐结果会与对齐结果真值相差较大的问题。
作为本公开实施例一种可选的实施方式,所述目标特征为对所述 目标图像中的像素点进行特征提取获取的特征,所述参考特征为对所述参考图像中的像素点进行特征提取获取的特征。
即,将从目标图像和参考图像中提取的图像分别作为目标特征和参考特征。
作为本公开实施例一种可选的实施方式,所述目标特征为对所述目标图像中的像素点进行特征提取并以预设降采样率对提取的提取进行降采样得到的特征;所述参考特征为对所述参考图像中的像素点进行特征提取并以所述预设降采样率对提取的提取进行降采样得到的特征。
示例性的,预设降采样率可以为1/16。
即,将从目标图像中提取的特征降采样为原特征的1/16作为目标特征,将从参考图像中提取的特征降采样为原特征的1/16作为参考特征。
上述实施例中,在将第一预设值设置的较大的情况下,偏移量预测卷积层可以获取足够大的感受野,但同时也会增大计算相似度特征的计算量,进而影响图像对齐效率。为了解决上述问题,上述实施例以中预设降采样率对从目标图像中提取的特征和从目标图像中提取的特征进行降采样,因此可以在保证具有足够大的感受野的同时,改善相似度特征的计算量过大,进而改善影响图像对齐效率。
此外,由于在以预设降采样率对所述目标特征和所述参考特征进行降采样并获取相似度特征后,需要将所述相似度特征与所述目标特征在通道的维度上串联获取偏移量预测特征,而在通道的维度上串联的特征的维度需要相同,因此还需要将相似度特征升采样为与目标特征维度相同的特征。即,参照图7所示,图6所示的流程基础上,本公开实施例提供的图像对齐方法还包括:对所述目标特征F2和所述参考特征F1进行降采样操作(图7以向下箭头示出),以及对相似度特征进行升采样操作(图7以向上箭头示出)。
在进一步的,本公开实施例中还可以采用级联金字塔式架构,从 多个不同的空间尺度对目标特征和参考特征进行渐进式对齐。
即,所述目标特征包括多个空间尺度的子目标特征,所述参考特征包括多个空间尺度的子参考特征,所述相似度特征多个空间尺度的子相似度特征,上述步骤S103(根据所述相似度特征、所述目标特征以及偏移量预测卷积层,获取所述目标特征和所述参考特征的偏移量),包括:
根据所述多个空间尺度的子相似度特征、所述多个空间尺度的子目标特征以及所述多个空间尺度的对应的偏移量预测卷积层,获取所述多个空间尺度的子目标特征和子参考特征的子偏移量。
由于上述实施例从多个空间尺度渐进式获取所述参考图像的特征与所述目标图像的特征之间偏移量,因此上述实施例可以更加准确的获取所述参考图像的特征与所述目标图像的特征之间偏移量。
所述目标特征包括多个空间尺度的子目标特征,所述参考特征包括多个空间尺度的子参考特征,所述偏移量包括多个空间尺度的子偏移量,上述步骤S104(根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐),包括:
根据所述多个空间尺度的子偏移量和所述多个空间尺度的对应的可形变卷积层,获取所述多个空间尺度的子目标特征和子参考特征的对齐结果;
根据所述多个空间尺度的子目标特征和子参考特征的对齐结果,将所述参考图像与所述目标图像对齐。
由于上述实施例从多个空间尺度渐进式将所述参考图像与所述目标图像对齐,因此上实施例可以提升所述参考图像与所述目标图像的对齐结果的准确性。
可选的,根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐,包括:
根据第n级空间尺度的子偏移量和第n-1级空间尺度的子偏移量,获取所述第n级空间尺度的目标偏移量;
根据所述第n级空间尺度的目标偏移量和所述第n级空间尺度对应的可形变卷积层,获取所述第n级空间尺度的对齐结果;
根据所述第n级空间尺度的对齐结果和所述第n-1级空间尺度的目标对齐结果,获取所述第n级空间尺度的目标对齐结果;
根据将第1级空间尺度的目标对齐结果获取所述参考图像与所述目标图像对齐的对齐结果;
其中,所述第n-1级空间尺度小于所述第n级空间尺度。
参照图8所示,图8中以采用3级级联金字塔式架构,从不同空间尺度对目标特征和参考特征进行渐进式对齐为例示出。
首先,从第1级、第2级以及第3级对应的空间尺度下获取子参考特征F1_1、F1_2、F1_3,子目标特征F2_1、F2_2以及F2_3。
然后,从第3级空间尺度开始,通过上述实施例提供的图像对齐方法,获取子参考特征F1_1和子目标特征F2_1的子偏移量Off_3及对其结果Fa_3,由于第3级空间尺度不包含上一级空间尺度,因此子偏移量Off_3直接作用于第3级空间尺度对应可形变卷积层,且第3级空间尺度的对齐结果与其目标结果相同。
其次,通过上述实施例提供的图像对齐方法,获取第2级空间尺度的子参考特征F1_2和子目标特征F2_2的子偏移量Off_2,并根据子偏移量Off_3和子偏移量Off_2生成第2级空间尺度的目标偏移量,再将第2级空间尺度的目标偏移量输入第2级空间尺度的可形变卷积层,获取第2级的对齐结果Fa_2,结合对齐结果Fa_2和第3级空间尺度的目标对其结果Fa_3,获取第2级空间尺度的目标对其结果。
最后,通过上述实施例提供的图像对齐方法,获取第1级空间尺度的子参考特征F1_1和子目标特征F2_1的子偏移量Off_1,并根据子偏移量Off_2和子偏移量Off_1生成第1级空间尺度的目标偏移量,再将第1级空间尺度的目标偏移量输入第1级空间尺度的可形变卷积层,获取第1级的对齐结果Fa_1,结合对齐结果Fa_1和第2级空间尺度的目标对其结,获取第1级空间尺度的目标对其结果(最终的对其结果)。
需要说明的是,图8中从3个不同空间尺度对目标特征和参考特征进行渐进式对齐为例示出,但本公开实施例并不限定与此,在上述实施例的基础上,还可以从其它数据量个不同空间尺度对目标特征和参考特征进行渐进式对齐。例如:从2个不同空间尺度对目标特征和参考特征进行渐进式对齐、从5个不同空间尺度对目标特征和参考特征进行渐进式对齐等,本公开实施例对此不做限定。
基于同一发明构思,作为对上述方法的实现,本公开实施例还提供了一种图像对齐装置,该装置实施例与前述方法实施例对应,为便于阅读,本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的图像对齐装置能够对应实现前述方法实施例中的全部内容。
本公开实施例提供了一种图像对齐装置,图9为该图像对齐装置的结构示意图,如图9所示,该图像对齐装置900包括:
特征获取单元91,用于获取目标特征和参考特征;所述目标特征包括目标图像中的像素点对应的特征点,所述参考特征包括参考图像中的像素点对应的特征点;
相似度获取单元92,用于根据所述目标特征和所述参考特征,获取相似度特征;所述相似度特征包括:所述目标特征中的各个特征点与对应的相关特征点的相似度,所述目标特征中的特征点对应的相关特征点包括:所述参考特征中像素坐标与所述目标特征中的特征点的像素坐标相同和相邻的特征点;
偏移量获取单元93,用于根据所述相似度特征、所述目标特征以及偏移量预测卷积层,获取所述目标特征和所述参考特征的偏移量;
处理单元94,用于根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐。
作为本公开实施例一种可选的实施方式,所述目标特征中的第一特征点对应的相关特征点包括:第二特征点以及所述第二特征点的第一预设值的邻域内的特征点;
其中,所述第二特征点为所述参考特征中像素坐标与所述第一特征点的像素坐标相同的特征点。
作为本公开实施例一种可选的实施方式,所述相似度获取单元92,具体用于确定所述目标特征中的第三特征点对应的第一空间域,所述第一空间域为所述第三特征点和所述第三特征点的第二预设值的邻域内的特征点形成的空间域;确定所述参考特征中的第四特征点对应的第二空间域,所述第二空间域为所述第四特征点和所述第四特征点的第二预设值的邻域内的特征点形成的空间域;所述第四特征点为所述第三特征对应的相关特征点;计算各个特征组中的特征点的外积,获取各个特征组的外积;所述特征组包括属于所述第一空间域的特征点以及属于所述第二空间域的特征点,且所述属于所述第一空间域的特征点在所述第一空间域中的位置与所述属于所述第二空间域的特征点在所述第二空间域中的位置相同;对各个特征组的外积求和,获取所述第三特征点与所述第四特征点的相似度。
作为本公开实施例一种可选的实施方式,所述目标特征为对所述目标图像中的像素点进行特征提取获取的特征,所述参考特征为对所述参考图像中的像素点进行特征提取获取的特征;
或者;
所述目标特征为对所述目标图像中的像素点进行特征提取并以预设降采样率对提取的提取进行降采样得到的特征;所述参考特征为对所述参考图像中的像素点进行特征提取并以所述预设降采样率对提取的提取进行降采样得到的特征。
作为本公开实施例一种可选的实施方式,所述偏移量获取单元93,具体用于将所述相似度特征与所述目标特征在通道的维度上串联,获取偏移量预测特征;将所述偏移量预测特征输入所述偏移量预测卷积层;获取所述偏移量预测卷积层的输出作为所述目标特征和所述参考特征的偏移量。
作为本公开实施例一种可选的实施方式,所述目标特征包括多个 空间尺度的子目标特征,所述参考特征包括多个空间尺度的子参考特征,所述相似度特征多个空间尺度的子相似度特征;
所述偏移量获取单元93,具体用于根据所述多个空间尺度的子相似度特征、所述多个空间尺度的子目标特征以及所述多个空间尺度的对应的偏移量预测卷积层,获取所述多个空间尺度的子目标特征和子参考特征的子偏移量。
作为本公开实施例一种可选的实施方式,所述处理单元94,具体用于将所述参考特征输入所述可形变卷积层,并通过所述偏移量控制所述可形变卷积层的卷积核的形状;
获取所述可形变卷积层的输出作为所述参考特征与所述目标特征的对齐结果。
作为本公开实施例一种可选的实施方式,所述目标特征包括多个空间尺度的子目标特征,所述参考特征包括多个空间尺度的子参考特征,所述偏移量包括多个空间尺度的子偏移量;
所述处理单元94,具体用于根据所述多个空间尺度的子偏移量和所述多个空间尺度的对应的可形变卷积层,获取所述多个空间尺度的子目标特征和子参考特征的对齐结果;根据所述多个空间尺度的子目标特征和子参考特征的对齐结果,将所述参考图像与所述目标图像对齐。
作为本公开实施例一种可选的实施方式,所述参考图像为待修复视频的第n个图像帧,所述参考图像为所述待修复视频的第n+1个图像帧;n为正整数。
本实施例提供的图像对齐装置可以执行上述方法实施例提供的图像对齐方法,其实现原理与技术效果类似,此处不再赘述。
基于同一发明构思,本公开实施例还提供了一种电子设备。图10为本公开实施例提供的电子设备的结构示意图,如图10所示,本实施例提供的电子设备包括:存储器101和处理器102,所述存储器101用于存储计算机程序;所述处理器102用于在调用计算机程序时执行 上述实施例提供的图像对齐方法。
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,当计算机程序被处理器执行时,使得所述计算设备实现上述实施例提供的图像对齐方法。
本公开实施例还提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机实现述实施例提供的图像对齐方法。
本领域技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质上实施的计算机程序产品的形式。
处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。存储器是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动存储介质。存储介质可以由任何方法或技术来实现信息存储,信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多 功能光盘(DVD)或其他光学存储、磁盒式磁带,磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。根据本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
最后应说明的是:以上各实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述各实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的范围。

Claims (13)

  1. 一种图像对齐方法,包括:
    获取目标特征和参考特征;所述目标特征包括所述目标图像中的像素点对应的特征点,所述参考特征包括参考图像中的像素点对应的特征点;
    根据所述目标特征和所述参考特征,获取相似度特征;所述相似度特征包括:所述目标特征中的各个特征点与对应的相关特征点的相似度,所述目标特征中的特征点对应的相关特征点包括:所述参考特征中像素坐标与所述目标特征中的特征点的像素坐标相同和相邻的特征点;
    根据所述相似度特征、所述目标特征以及偏移量预测卷积层,获取所述目标特征和所述参考特征的偏移量;以及
    根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐。
  2. 根据权利要求1所述的方法,其中所述目标特征中的第一特征点对应的相关特征点包括:第二特征点以及所述第二特征点的第一预设值的邻域内的特征点;并且
    其中,所述第二特征点为所述参考特征中像素坐标与所述第一特征点的像素坐标相同的特征点。
  3. 根据权利要求1所述的方法,其中所述根据所述目标特征和所述参考特征,获取相似度特征,包括:
    确定所述目标特征中的第三特征点对应的第一空间域,所述第一空间域为所述第三特征点和所述第三特征点的第二预设值的邻域内的特征点形成的空间域;
    确定所述参考特征中的第四特征点对应的第二空间域,所述第二空间域为所述第四特征点和所述第四特征点的所述第二预设值的邻域内的特征点形成的空间域;所述第四特征点为所述第三特征对应的相 关特征点;
    计算各个特征组中的特征点的外积,获取各个特征组的外积;所述特征组包括属于所述第一空间域的特征点以及属于所述第二空间域的特征点,且所述属于所述第一空间域的特征点在所述第一空间域中的位置与所述属于所述第二空间域的特征点在所述第二空间域中的位置相同;以及
    对各个特征组的外积求和,获取所述第三特征点与所述第四特征点的相似度。
  4. 根据权利要求1所述的方法,其中,
    所述目标特征为对所述目标图像中的像素点进行特征提取获取的特征,所述参考特征为对所述参考图像中的像素点进行特征提取获取的特征;
    或者;
    所述目标特征为对所述目标图像中的像素点进行特征提取并以预设降采样率对提取的提取进行降采样得到的特征;所述参考特征为对所述参考图像中的像素点进行特征提取并以所述预设降采样率对提取的提取进行降采样得到的特征。
  5. 根据权利要求1-4任一项所述的方法,其中所述根据所述相似度特征、所述目标特征以及偏移量预测卷积层,获取所述目标特征和所述参考特征的偏移量,包括:
    将所述相似度特征与所述目标特征在通道的维度上串联,获取偏移量预测特征;
    将所述偏移量预测特征输入所述偏移量预测卷积层;以及
    获取所述偏移量预测卷积层的输出作为所述目标特征和所述参考特征的偏移量。
  6. 根据权利要求1-4任一项所述的方法,其中所述目标特征包括多个空间尺度的子目标特征,所述参考特征包括多个空间尺度的子参考特征,所述相似度特征多个空间尺度的子相似度特征;
    所述据所述相似度特征、所述目标特征以及偏移量预测卷积层,获取所述目标特征和所述参考特征的偏移量,包括:
    根据所述多个空间尺度的子相似度特征、所述多个空间尺度的子目标特征以及所述多个空间尺度的对应的偏移量预测卷积层,获取所述多个空间尺度的子目标特征和子参考特征的子偏移量。
  7. 根据权利要求1-4任一项所述的方法,其中所述根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐,包括:
    将所述参考特征输入所述可形变卷积层,并通过所述偏移量控制所述可形变卷积层的卷积核的形状;
    获取所述可形变卷积层的输出作为所述参考特征与所述目标特征的对齐结果;以及
    根据所述参考特征与所述目标特征的对齐结果,将所述参考图像与所述目标图像对齐。
  8. 根据权利要求1-4任一项所述的方法,其中所述目标特征包括多个空间尺度的子目标特征,所述参考特征包括多个空间尺度的子参考特征,所述偏移量包括多个空间尺度的子偏移量;
    所述根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐,包括:
    根据所述多个空间尺度的子偏移量和所述多个空间尺度的对应的可形变卷积层,获取所述多个空间尺度的子目标特征和子参考特征的对齐结果;以及
    根据所述多个空间尺度的子目标特征和子参考特征的对齐结果,将所述参考图像与所述目标图像对齐。
  9. 根据权利要求1-4任一项所述的方法,其中所述参考图像为待修复视频的第n个图像帧,所述参考图像为所述待修复视频的第n+1个图像帧;n为正整数。
  10. 一种图像对齐装置,包括:
    特征获取单元,被配置为获取目标特征和参考特征;所述目标特 征包括目标图像中的像素点对应的特征点,所述参考特征包括参考图像中的像素点对应的特征点;
    相似度获取单元,被配置为根据所述目标特征和所述参考特征,获取相似度特征;所述相似度特征包括:所述目标特征中的各个特征点与对应的相关特征点的相似度,所述目标特征中的特征点对应的相关特征点包括:所述参考特征中像素坐标与所述目标特征中的特征点的像素坐标相同和相邻的特征点;
    偏移量获取单元,被配置为根据所述相似度特征、所述目标特征以及偏移量预测卷积层,获取所述目标特征和所述参考特征的偏移量;以及
    处理单元,被配置为根据所述偏移量和可形变卷积层,将所述参考图像与所述目标图像对齐。
  11. 一种电子设备,包括:存储器和处理器,所述存储器被配置为存储计算机程序;所述处理器被配置为在调用计算机程序时,使得所述电子设备实现权利要求1-9任一项所述的图像对齐方法。
  12. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,当所述计算机程序被计算设备执行时,使得所述计算设备实现权利要求1-9任一项所述的图像对齐方法。
  13. 一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机实现权利要求1-9任一项所述的图像对齐方法。
PCT/CN2022/093799 2021-05-21 2022-05-19 一种图像对齐方法及装置 WO2022242713A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110557632.8 2021-05-21
CN202110557632.8A CN115393405A (zh) 2021-05-21 2021-05-21 一种图像对齐方法及装置

Publications (1)

Publication Number Publication Date
WO2022242713A1 true WO2022242713A1 (zh) 2022-11-24

Family

ID=84113826

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/093799 WO2022242713A1 (zh) 2021-05-21 2022-05-19 一种图像对齐方法及装置

Country Status (2)

Country Link
CN (1) CN115393405A (zh)
WO (1) WO2022242713A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016177259A1 (zh) * 2015-05-07 2016-11-10 阿里巴巴集团控股有限公司 一种相似图像识别方法及设备
CN107527360A (zh) * 2017-08-23 2017-12-29 维沃移动通信有限公司 一种图像对齐方法及移动终端
CN107633526A (zh) * 2017-09-04 2018-01-26 腾讯科技(深圳)有限公司 一种图像跟踪点获取方法及设备、存储介质
CN111914878A (zh) * 2020-06-16 2020-11-10 北京迈格威科技有限公司 特征点跟踪训练及跟踪方法、装置、电子设备及存储介质
CN111915484A (zh) * 2020-07-06 2020-11-10 天津大学 基于密集匹配与自适应融合的参考图引导超分辨率方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016177259A1 (zh) * 2015-05-07 2016-11-10 阿里巴巴集团控股有限公司 一种相似图像识别方法及设备
CN107527360A (zh) * 2017-08-23 2017-12-29 维沃移动通信有限公司 一种图像对齐方法及移动终端
CN107633526A (zh) * 2017-09-04 2018-01-26 腾讯科技(深圳)有限公司 一种图像跟踪点获取方法及设备、存储介质
CN111914878A (zh) * 2020-06-16 2020-11-10 北京迈格威科技有限公司 特征点跟踪训练及跟踪方法、装置、电子设备及存储介质
CN111915484A (zh) * 2020-07-06 2020-11-10 天津大学 基于密集匹配与自适应融合的参考图引导超分辨率方法

Also Published As

Publication number Publication date
CN115393405A (zh) 2022-11-25

Similar Documents

Publication Publication Date Title
Tang et al. Learning guided convolutional network for depth completion
US8160364B2 (en) System and method for image registration based on variable region of interest
US11321937B1 (en) Visual localization method and apparatus based on semantic error image
US20200210708A1 (en) Method and device for video classification
Moulon et al. Adaptive structure from motion with a contrario model estimation
US11429805B2 (en) System and method for deep machine learning for computer vision applications
Shapira et al. Multiple histogram matching
WO2022141178A1 (zh) 图像处理方法及装置
US20220391632A1 (en) System and method for deep machine learning for computer vision applications
US20190279022A1 (en) Object recognition method and device thereof
WO2023202695A1 (zh) 数据处理方法及装置、设备、介质
Ni et al. Pats: Patch area transportation with subdivision for local feature matching
de Lima et al. Parallel hashing-based matching for real-time aerial image mosaicing
WO2022242713A1 (zh) 一种图像对齐方法及装置
CN104077764A (zh) 一种基于图像拼接的全景图合成方法
Sun et al. Decoupled feature pyramid learning for multi-scale object detection in low-altitude remote sensing images
CN113808033A (zh) 图像文档校正方法、系统、终端及介质
Zhang et al. The farther the better: Balanced stereo matching via depth-based sampling and adaptive feature refinement
Wang et al. A real-time correction and stitching algorithm for underwater fisheye images
CN112348057A (zh) 一种基于yolo网络的目标识别方法与装置
Kanaeva et al. Camera pose and focal length estimation using regularized distance constraints
WO2019024723A1 (zh) 特征点匹配结果处理方法和装置
CN115205112A (zh) 一种真实复杂场景图像超分辨率的模型训练方法及装置
CN107274430B (zh) 一种对象运动轨迹预测方法和装置
US20230377177A1 (en) Focal stack alignment method and depth estimation method using the same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22804033

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18562821

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22804033

Country of ref document: EP

Kind code of ref document: A1