WO2022095596A1

WO2022095596A1 - Image alignment method, image alignment apparatus and terminal device

Info

Publication number: WO2022095596A1
Application number: PCT/CN2021/117471
Authority: WO
Inventors: 林枝叶
Original assignee: Oppo广东移动通信有限公司
Priority date: 2020-11-09
Filing date: 2021-09-09
Publication date: 2022-05-12
Also published as: CN112348863B; CN112348863A

Abstract

Provided is an image alignment method, comprising: for each first grid area obtained by means of pre-division in a first modal image, determining at least two first candidate matching points in the first grid area; for each first candidate matching point in the first grid area, searching, in a second modal image, for a first target pixel point corresponding to the first candidate matching point, wherein cross-correlation information between the first target pixel point and the corresponding first candidate matching point conforms to a pre-set cross-correlation condition; determining a matching point pair between the first modal image and the second modal image according to found first target pixel points; according to the matching point pair, obtaining a grid transformation matrix between the first grid area and a second grid area that corresponds to the first grid area in the second modal image; and according to each grid transformation matrix, transforming the second modal image into a target image that is aligned relative to the first modal image.

Description

Image alignment method, image alignment device and terminal device

This application claims the priority of the Chinese patent application with the application number 202011237926.4 and the application title "Image Alignment Method, Image Alignment Device and Terminal Equipment" filed with the China Patent Office on November 9, 2020, the entire contents of which are incorporated by reference into in this application.

technical field

The present application belongs to the technical field of image processing, and in particular, relates to an image alignment method, an image alignment apparatus, a terminal device and a computer-readable storage medium.

Background technique

Image alignment technology is a very important and basic technology in image processing, which can be applied to many image processing tasks.

For example, terminals such as mobile phones, AR glasses, and virtual reality devices are often integrated with multiple cameras, and the imaging principles used by different cameras may also be different. For example, there may be infrared cameras and RGB imaging cameras on the terminal. The images collected by cameras with different imaging principles can be considered as images of different modalities. At this time, images of different modalities need to be aligned to achieve stitching fusion. In addition, in the field of medical images, remote sensing images and other application fields, image alignment technology should also be used to achieve different modalities such as Computed Tomography (CT) images and Magnetic Resonance Imaging (MRI). Stitching and fusion between images.

SUMMARY OF THE INVENTION

The embodiments of the present application provide an image alignment method, an image alignment device, a terminal device, and a computer-readable storage medium, which can solve the problem that the existing method cannot find accurate matching point pairs between images of different modalities, so that different modalities cannot find accurate matching point pairs. The problem of poor image alignment accuracy between the images.

In a first aspect, an embodiment of the present application provides an image alignment method, including:

For each first grid area pre-divided in the first modal image, determining at least two first candidate matching points in the first grid area;

For each first candidate matching point in the first grid area, the first target pixel corresponding to the first candidate matching point is searched from the second modal image, wherein the first target pixel The cross-correlation information between the point and the corresponding first candidate matching point meets a preset cross-correlation condition;

If the first target pixel point corresponding to the first candidate matching point is found from the second modal image, the first candidate matching point and the first matching point corresponding to the first candidate matching point The target pixel is used as a set of matching point pairs between the first modal image and the second modal image;

According to the matching point pair, a grid transformation matrix between the first grid area and the second grid area corresponding to the first grid area in the second modal image is obtained, wherein, The position of the first grid area in the first modal image is the same as the position of the second grid area corresponding to the first grid area in the second modal image;

Transforming the second modality image into a target image aligned relative to the first modality image according to the respective grid transformation matrices.

In a second aspect, an embodiment of the present application provides an image alignment device, including:

a determining module, configured to determine at least two first candidate matching points in the first grid area for each first grid area pre-divided in the first modal image;

A search module is configured to search, for each first candidate matching point in the first grid area, the first target pixel point corresponding to the first candidate matching point from the second modal image, wherein the The cross-correlation information between the first target pixel point and the corresponding first candidate matching point meets a preset cross-correlation condition;

The first processing module is configured to, if the first target pixel corresponding to the first candidate matching point is found from the second modal image, compare the first candidate matching point with the first candidate matching point The first target pixel point corresponding to the matching point is used as a set of matching point pairs between the first modal image and the second modal image;

a second processing module, configured to obtain, according to the matching point pair, the difference between the first grid area and the second grid area corresponding to the first grid area in the second modal image A grid transformation matrix, wherein the position of the first grid area in the first modal image and the second grid area corresponding to the first grid area are in the second modal image the same location;

A transformation module, configured to transform the second modal image into a target image aligned with respect to the first modal image according to each grid transformation matrix.

In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, a display, and a computer program stored in the memory and running on the processor, characterized in that the processor executes the computer The image alignment method as described above in the first aspect is implemented during the program.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the image alignment method described in the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer program product that, when the computer program product runs on a terminal device, enables the terminal device to execute the image alignment method described above in the first aspect.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only for the present application. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

1 is a schematic flowchart of an image alignment method provided by an embodiment of the present application;

FIG. 2 is an exemplary schematic diagram of a distribution manner of a first candidate matching point in a first grid area provided by an embodiment of the present application;

FIG. 3 is a schematic flowchart of step S102 provided by an embodiment of the present application;

4 is an exemplary schematic diagram of aligning the first modality image and the second modality image provided by an embodiment of the present application;

5 is a schematic structural diagram of an image alignment apparatus provided by an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.

Detailed ways

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are set forth in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It is to be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described feature, integer, step, operation, element and/or component, but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or sets thereof.

It will also be understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items.

As used in the specification of this application and the appended claims, the term "if" may be contextually interpreted as "when" or "once" or "in response to determining" or "in response to detecting ". Similarly, the phrases "if it is determined" or "if the [described condition or event] is detected" may be interpreted, depending on the context, to mean "once it is determined" or "in response to the determination" or "once the [described condition or event] is detected. ]" or "in response to detection of the [described condition or event]".

References in this specification to "one embodiment" or "some embodiments" and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically emphasized otherwise. The terms "including", "including", "having" and their variants mean "including but not limited to" unless specifically emphasized otherwise.

The image alignment method provided by the embodiments of the present application can be applied to servers, desktop computers, mobile phones, tablet computers, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) devices, and notebook computers , ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook, personal digital assistant (personal digital assistant, PDA) and other terminal equipment, the embodiment of the present application does not make any restrictions on the specific type of the terminal equipment.

When performing image alignment, it is often necessary to detect and match the feature points between the images to be aligned before determining the transformation matrix between the images. At present, traditional image alignment methods are often extracted through Scale-invariant feature transform (SIFT), accelerated segmentation test features (Features from Accelerated Segment Test, FAST), and accelerated robust features (Speeded Up Robust Features, SURF) extraction and other algorithms to achieve image feature point detection and matching. However, these feature point detection algorithms all rely on the consistency of the gradient directions in the structurally similar regions of the image. However, due to different imaging principles, there may be different pixel changes in the gradient direction of structurally similar regions between images of different modalities, and even there may be contrasts, so that the existing feature point extraction methods cannot be used between images of different modalities. Accurate feature point detection and matching results in lower accuracy of image alignment.

However, through the embodiment of the present application, the similarity between pixels can be measured through the cross-correlation information, so as to find the exact matching point pair between the first modal image and the second modal image, which can reduce the number of different modalities. The interference caused by the difference in the gradient direction of the images in the structurally similar regions also ensures the alignment accuracy of the final target image accordingly.

Specifically, FIG. 1 shows a flowchart of an image alignment method provided by an embodiment of the present application, and the image alignment method can be applied to a terminal device.

As shown in Figure 1, the image alignment method may include:

Step S101, for each first grid area pre-divided in the first modal image, determine at least two first candidate matching points in the first grid area.

In this embodiment of the present application, the first modality image and the second modality image may be considered to be images of different modalities. Among them, images of different modalities can be considered as images collected by cameras with different imaging principles. For example, the infrared image collected by the infrared camera and the image collected by the RGB imaging camera can be considered as images of different modalities. In addition, Computed Tomography (CT) images, Magnetic Resonance Imaging (Magnetic Resonance Imaging, MRI) and ultrasound images can also be considered as images of different modalities.

The imaging principles respectively corresponding to the first modal image and the second modal image may be determined according to actual scene requirements, which are not limited herein. In some examples, the first modality image may be an RGB image, and the second modality image may be a modality image other than an RGB image, such as an infrared image.

The size and division method of the first grid area may also be determined according to the actual scene. Each of the first grid regions in the first modality image may be uniformly distributed. If the resolution of the first modal image is W*H, where the width is W, the height is H, and the size of the first grid area is w*h, then the number of the first grid area is W/w*H/h. The position of the first mesh region in the first modal image may be identified by the coordinates of four vertices.

In this embodiment of the present application, there may be various manners for determining at least two first candidate matching points in the first grid area. For example, the pixel points obtained by uniform sampling in the first grid area may be used as the first candidate matching points. In addition, the number of the first candidate matching points in the first grid area may also be determined according to the image conditions in the first grid area, so as to determine the coordinate position of each first candidate matching point.

In some embodiments, the first modality image and the second modality image may be aligned in coplanar rows. At this time, the image plane corresponding to the first modal image and the image plane corresponding to the second modal image are parallel to each other, thereby reducing the stereo parallax, and reducing the complexity of matching in the subsequent search for matching point pairs And the amount of calculation, improve the accuracy of the grid transformation matrix determined according to the matching point pair.

In some embodiments, before determining at least two first candidate matching points in the first grid area for each first grid area pre-divided in the first modal image, the method further includes:

acquiring a first original image captured by the first camera and a second original image captured by the second camera;

Correcting the first original image and the second original image respectively according to the pre-calibrated first camera parameters of the first camera and the second camera parameters of the second camera;

Taking the corrected first original image as the first modal image, and taking the corrected second original image as the second modal image, wherein the first modal image and the second modal image Coplanar row alignment.

The first camera parameters may include internal parameters, external parameters and/or distortion parameters of the first camera. The second camera parameters may include internal parameters, external parameters and/or distortion parameters of the second camera. The internal parameters may be parameters related to the characteristics of the corresponding camera, such as the focal length and pixel distribution of the corresponding camera. The external parameters may indicate the pose of the corresponding camera in the world coordinate system, which is determined by the relative pose relationship between the camera and the world coordinate system. Exemplarily, the external parameters may include a rotation vector and a translation vector. The distortion parameters may include radial distortion parameters and/or tangential distortion parameters.

Through the first camera parameters and the second camera parameters, it is possible to determine the difference between the image coordinate system of the first original image captured by the first camera and the image coordinate system of the second original image captured by the second camera relative relationship.

In this embodiment of the present application, the first camera and the second camera may be calibrated in advance to determine the first camera parameters of the first camera and the second camera parameters of the second camera. There may be various specific calibration methods. For example, the first camera and the second camera may be calibrated respectively by Zhang Zhengyou's calibration method.

Exemplarily, the separately correcting the first original image and the second original image may include performing distortion correction and/or stereo correction on the first original image, and performing distortion on the second original image. Correction and/or Stereo Correction. The distortion correction may be to correct the image distortion of the corresponding image through distortion parameters, such as correcting the radial distortion of the image and the tangential distortion of the image. The stereo correction can align the non-coplanar lines of the two images into a coplanar line alignment. At this time, the plane of the corrected first original image is parallel to the plane of the corrected second original image, corresponding to The optical axes of the cameras are concentric, and the poles of the corrected first original image and the corrected second original image are at infinity.

Through the embodiments of the present application, the first original image and the second original image can be corrected to obtain the first modal image and the second modal image aligned in a coplanar row, so that the subsequent search can be performed. When matching point pairs, search in images whose image planes are parallel to each other to avoid errors caused by differences in image distortion and image shooting angle, improve the search efficiency and accuracy of matching point pairs, and thus improve the accuracy of image alignment .

In some embodiments, determining at least two first candidate matching points in the first grid area for each pre-divided first grid area in the first modal image includes:

For each first grid area pre-divided in the first modal image, determine at least two first candidate matches in the first grid area according to the pixel point gradient information of the first grid area point.

In the embodiment of the present application, the gradient information of the pixel point may reflect the change of the pixel value of the pixel point relative to the pixel value of the surrounding pixel points. In some examples, the pixel point gradient information of the first grid area itself may indicate the distribution of image content in the first grid area. Therefore, if the pixel point gradient variation range of the first grid area itself If the value is larger and the gradient value is larger, the amount of information in the first grid area may be large, and the number of the first candidate matching points in the first grid area can be increased.

In some examples, the first grid area may be determined by combining the pixel point gradient information of the first grid area and the pixel point gradient information in the second grid area corresponding to the first grid area. The number of first candidate matching points in the grid area. For example, if the difference between the gradient information of the pixel points in the second grid area corresponding to the first grid area and the gradient information of the pixel points in the first grid area is large, it can be considered that the first grid area The similarity between a grid area and the second grid area corresponding to the first grid area is poor, and the degree of coincidence is poor. Therefore, the first candidate matching point in the first grid area can be improved. to obtain as many matching point pairs as possible between the first grid area and the second grid area corresponding to the first grid area, thereby improving the accuracy of image alignment.

In some embodiments, for each first grid area pre-divided in the first modal image, according to pixel point gradient information of the first grid area, determine the area in the first grid area. at least two first candidate matching points of , including:

For each first grid area pre-divided in the first modal image, according to the first gradient value of each first pixel point in the first grid area and the corresponding value of the first grid area The second gradient value of each second pixel point in the second grid area, to determine the alignment error between the first grid area and the second grid area corresponding to the first grid area;

determining the number of the first candidate matching points in the first grid area according to the alignment error;

A first candidate matching point in the first grid area is determined according to the number of the first candidate matching points.

The alignment error may reflect a difference in pixel point gradient information between the first grid area and a second grid area corresponding to the first grid area. If the alignment error between the first grid area and the second grid area corresponding to the first grid area is large, it can be considered that the first grid area and the first grid area There is no alignment between the second grid regions corresponding to the regions, and ghost images are likely to occur during subsequent fusion. Therefore, if the alignment error is large, the first grid region in the first grid region can be increased. A number of candidate matching points, so as to obtain a more accurate transformation relationship between the first grid area and the second grid area corresponding to the first grid area later. Specifically, the alignment error may be calculated and obtained by comparing the difference between the first gradient value of each first pixel point and the second gradient value of the corresponding second pixel point.

After the alignment error is obtained by calculation, the number of the first candidate matching points in the first grid area may be determined according to the size of the alignment error. For example, if the alignment error is greater than a preset error threshold, determine that the number of first candidate matching points in the first grid area is M, and if the alignment error is not greater than a preset error threshold, determine The number of the first candidate matching points in the first grid area is N, where N and M are positive integers respectively, and M is greater than N.

In the embodiment of the present application, after the number of the first candidate matching points is determined, each first candidate matching point may be determined according to information such as the size of the first grid area and the number of the first candidate matching points distribution of points in the first grid area, so as to determine the position of the first candidate matching point in the first grid area.

Exemplarily, as shown in FIG. 2 , it is an exemplary schematic diagram of the distribution manner of the first candidate matching points in the first grid area.

Wherein, if the alignment error is greater than a preset error threshold, the number of the first candidate matching points in the first grid area is determined to be 9, and if the alignment error is not greater than a preset error threshold, the number of the first candidate matching points is determined to be 9. The number of the first candidate matching points in the first grid area is four.

In some embodiments, for each first grid area pre-divided in the first modal image, according to the first gradient value of each first pixel in the first grid area, and the the second gradient value of each second pixel point in the second grid area corresponding to the first grid area, to determine the second grid area corresponding to the first grid area and the first grid area Alignment errors between regions, including:

For each first pixel point in the first grid area, a first gradient value of the first pixel point and the second gradient value of the second pixel point corresponding to the first pixel point are set between The absolute value of the difference is taken as the first absolute value, wherein the position of the first pixel in the first modal image and the second pixel corresponding to the first pixel are in the second The position in the modal image is the same;

Taking the sum of the absolute values of the first gradient values in the first grid area as the first summation result, and using the second gradient values in the second grid area corresponding to the first grid area The sum of the absolute values of the values is used as the second summation result;

According to each first absolute value in the first grid area, the first summation result and the second summation result, it is determined that the first grid area corresponds to the first grid area The alignment error between the second grid regions.

Taking the first grid area A as an example, the coordinate range of the first grid area A is x∈[0,w ₁ ],y∈[0,h ₁ ], then the alignment error of the first grid area A is δ ₁ for:

Wherein, grad _rgb (x, y) is the first gradient value of any first pixel point, and grad _spectral (x, y) is the second gradient value of any second pixel point.

If δ ₁ ≥ threshold, the number of the first candidate matching points may be (3*n) ² , otherwise the number of the first candidate matching points may be n ² , and the threshold is a preset error threshold.

Step S102, for each first candidate matching point in the first grid area, search for the first target pixel point corresponding to the first candidate matching point from the second modal image, wherein the first candidate matching point is The cross-correlation information between a target pixel point and the corresponding first candidate matching point conforms to a preset cross-correlation condition.

In the embodiment of the present application, the cross-correlation information may include a normalized cross-correlation (Normalized Cross Correlation, NCC) value, a cross-correlation value calculated according to a preset cross-correlation function, etc., which may measure the cross-correlation of two related pixels. value of sex. The preset cross-correlation condition may be determined according to the type of the cross-correlation information. For example, if the cross-correlation information includes the normalized cross-correlation value, the preset cross-correlation condition may be that the normalized cross-correlation value is greater than a preset cross-correlation threshold.

In some embodiments, the step S102 includes:

Step S301, for each first candidate matching point in the first grid area, determine the region of interest corresponding to the first candidate matching point from the second modal image;

Step S302, for each pixel in the region of interest, calculate the cross-correlation metric value between the pixel and the first candidate matching point;

Step S303, if the maximum value of the cross-correlation metric values corresponding to each pixel in the region of interest is greater than a preset threshold, then the pixel corresponding to the maximum value in the region of interest is used as the maximum value in the region of interest. The first target pixel point of the first candidate matching point.

In this embodiment of the present application, the region of interest may be considered as a search range for searching for a first target pixel point corresponding to the first candidate matching point. The size of the region of interest may be determined according to scene requirements, for example, may be predetermined according to information such as computing resources, the positional relationship between the cameras corresponding to the first modal image and the second modal image respectively . The cross-correlation metric value may be a normalized cross-correlation (Normalized Cross Correlation, NCC) value, a cross-correlation value calculated according to a preset cross-correlation function, and the like.

In some embodiments, the cross-correlation metric value is a normalized cross-correlation value;

The calculating, for each pixel point in the region of interest, a cross-correlation metric value between the pixel point and the first candidate matching point, including:

For each pixel in the region of interest, the normalized cross-correlation value between the pixel and the first candidate matching point is calculated according to the specified correlation region, where the specified correlation region is the specified correlation region. a designated area centered on the pixel in the second modal image.

Taking a certain first candidate matching point (x ₁ , y ₁ ) as an example, the pixel value of the first candidate matching point (x ₁ , y ₁ ) is R _rgb (x ₁ , y ₁ ). Determine the coordinate range of the region of interest of the first candidate matching point (x ₁ , y ₁ ) as

That is, the region of interest is

As the starting point, a rectangular area with both width and height k. For a certain pixel point R _spectral (x,y) in the region of interest, the designated associated area of the pixel point R _spectral (x,y) is the pixel point R _spectral (x,y) as the center, with d is a rectangular area D of width and height.

Calculate the normalized cross-correlation value NCC of the first candidate matching point and the pixel point R _spectral (x, y), and the calculation formula is as follows:

Among them, u _{rgb_roi} represents the expectation of the pixel value of each pixel in the corresponding region of the region of interest in the first modal image, u _{spectral_roi} represents the expectation of the pixel value of each pixel in the region of interest, and σ _{rgb_roi} represents The variance of the pixel value of each pixel in the corresponding region of the region of interest in the first modal image, σ _{spectral_roi} represents the variance of the pixel value of each pixel in the region of interest, when the NCC is larger, it means that the first A candidate matching point is more similar to the pixel point R _spectral (x, y).

Calculate the NCC value of each pixel in the region of interest relative to the first candidate matching point, and sort, and select the largest NCC value NCC (x _m , y _m ), if the NCC (x _m ) , y _m ) _{≥T ncc} , then the pixel in the region of interest corresponding to the NCC(x _m , y _m ) is the first target pixel of the first candidate matching point. T _ncc can be a preset threshold.

Step S103, if the first target pixel point corresponding to the first candidate matching point is found from the second modal image, then the first candidate matching point and the first candidate matching point corresponding to the first candidate matching point are found. The first target pixel point of is used as a set of matching point pairs between the first modal image and the second modal image.

Using the cross-correlation information to measure the similarity between pixels to find the first target pixel corresponding to the first candidate matching point can reduce the gradient direction of images of different modalities in structurally similar regions Compared with the existing feature point detection and matching based on SIFT and other methods, the accuracy of the obtained matching point pairs can be improved.

In some embodiments, the distance between the first grid area and the second grid area corresponding to the first grid area in the second modal image is obtained according to each set of matched point pairs. Before the mesh transformation matrix, also include:

For each second grid area pre-divided in the second modal image, determine at least two second candidate matches in the second grid area according to the gradient information of the second grid area point;

For each second candidate matching point in the second grid area, a second target pixel corresponding to the second candidate matching point is searched from the first modal image, wherein the second target pixel The cross-correlation information between the point and the corresponding second candidate matching point meets a preset cross-correlation condition;

If the second target pixel point corresponding to the second candidate matching point is found from the first modal image, the second candidate matching point and the second matching point corresponding to the second candidate matching point The target pixel points are used as a set of matching point pairs between the first modal image and the second modal image.

In this embodiment of the present application, not only can the image coordinate system of the first modal image be used as a benchmark to search for the matching point pair according to the first candidate matching points, but also the image coordinate system of the second modal image can be used As a benchmark, the matching point pair is searched according to the second candidate matching point, so that the number of the matching point pair can be increased.

Step S104, obtaining a grid transformation matrix between the first grid area and the second grid area corresponding to the first grid area in the second modal image according to the matching point pair , wherein the position of the first grid area in the first modal image is the same as the position of the second grid area corresponding to the first grid area in the second modal image.

In this embodiment of the present application, the distribution manner of each of the first grid areas in the first modal image is the same as the distribution manner of each of the second grid areas in the second modal image.

The grid transformation matrix may indicate a translational transformation relationship and/or a rotational transformation relationship of the image in the first grid area with respect to the image in the corresponding second grid area, and the like. The specific calculation method of the grid transformation matrix may be determined according to the situation such as the number of the matching point pairs. If the number of matching point pairs between the first grid area and the second grid area corresponding to the first grid area in the second modal image is not less than a preset number, for example , not less than 4, then according to the matching point pair, through affine transformation, homography transformation, etc., to calculate the difference between the first grid area and the first grid area in the second modal image The grid transformation matrix between the corresponding second grid regions. And if the number of matching point pairs between the first grid area and the second grid area corresponding to the first grid area in the second modal image is less than the preset number, then The matching points of other points in the first grid area in the second grid area corresponding to the first grid area in the second modal image may be further calculated according to the matching point pair , and then calculate the grid transformation matrix.

Step S105: Transform the second modal image into a target image aligned with the first modal image according to each grid transformation matrix.

In this embodiment of the present application, each second grid region may be transformed according to each grid transformation matrix, and then the transformed second grid regions may be combined to obtain the target image. At this time, the transformation of each second grid area can be independent of each other, instead of realizing the transformation of the entire second modal image through a unified transformation matrix, which can improve the accuracy of the transformation of each local area, thereby greatly improving the image. Alignment precision.

In some embodiments, the obtaining between the first grid area and the second grid area corresponding to the first grid area in the second modal image according to the matching point pair Grid transformation matrix, including:

According to the matching point pair, the coordinates of the designated vertex in the first mesh area, and the coordinate of the designated vertex in the first mesh area in the second mesh area corresponding to the first mesh area Expected coordinates, build a least squares model;

Solving the least squares model to obtain the desired coordinates of the specified vertex in the first grid region in the second grid region corresponding to the first grid region;

According to the expected coordinates and the coordinates of the specified vertex in the second mesh area corresponding to the first mesh area, obtain the second mesh area corresponding to the first mesh area and the first mesh area a homography matrix between grid regions, and using the homography matrix as the grid transformation matrix;

The transforming the second modal image into a target image aligned relative to the first modal image according to each grid transformation matrix includes:

For each second grid region, perform perspective transformation on the second grid region according to the homography matrix corresponding to the second grid region;

A target image aligned with respect to the first modality image is obtained according to each perspective transformed second grid area.

In some examples, the designated vertices may be four vertices of the first mesh region. At this time, the specified vertices can be four vertices of the first mesh area, and the relationship between the first mesh area and the second mesh area corresponding to the first mesh area can be solved. The homography matrix between . Of course, in some embodiments, when there are multiple matching point pairs, the number of the specified vertices may also be less than 3. In this case, the desired coordinates of the specified vertices and the matching points may be combined Calculate the homography matrix for the pair.

The construction and solution of the least squares model can be implemented according to the prior art. Exemplarily, according to the matching point pair, the coordinates of the designated vertex in the first mesh area, and the second corresponding to the first mesh area of the designated vertex in the first mesh area. desired coordinates in the grid area, constructing the least squares model that can indicate the first graph and the second graph by calculating a preset error between vertices in the first graph and The shape similarity between the second graphics. Wherein, the first graph is a first candidate matching point in the matching point pair and a designated vertex in the first grid area, and the second graph is a center between the desired coordinate and the matching point The graphics composed of the first target pixel points. Therefore, the preset error can be optimized to minimize the preset error, so as to obtain the relationship between the specified vertex in the first mesh area in the second mesh area corresponding to the first mesh area desired coordinates.

Specifically, the least squares model can be solved by means of Gauss-Newton method, gradient descent method, LM (Levenberg-Marquart) method, or the like. The specific solution method is not limited here.

In some embodiments, after the target image is acquired, the target image may be stored in the form of a binary file for subsequent image reading and processing. Of course, the target image can also be stored in other formats according to the needs of the scene.

A specific implementation schematic diagram of an embodiment of the present application is described below with a specific example.

As shown in FIG. 4 , in a practical application, an image processing process implemented by an embodiment of the present application is adopted.

Among them, Fig. 4(a) is the first modal image, and Fig. 4(b) is the second modal image. If the first modal image and the second modal image are aligned in a coplanar row, then, after directly merging the first modal image and the second modal image, FIG. 4( c ) is obtained. At this time, a clear positional deviation appears in Fig. 4(c).

After the first modal image and the second modal image are obtained, for each first grid area pre-divided in the first modal image, according to each of the first grid areas The first gradient value of the first pixel point, and the second gradient value of each second pixel point in the second grid area corresponding to the first grid area, determine the relationship between the first grid area and the The alignment error between the second grid areas corresponding to the first grid area.

If the homogeneous error is greater than the preset error threshold, the number of the corresponding first candidate matching points in the first grid area may be larger, for example, 9; and if the homogeneous error is not equal to is greater than the preset error threshold, then, the number of the corresponding first candidate matching points in the first grid area may be less, for example, four. As shown in FIG. 4( d ), in this exemplary scenario, the first grid area with a large number of first candidate matching points and the first grid area with a large number of first candidate matching points The distribution of grid areas.

After the matching point pairs respectively determined by the respective first grid regions are obtained, each grid transformation matrix is obtained according to the matching point pairs. Then, according to the grid transformation matrix, according to the matching point pair, the coordinates of the four vertices of each first grid area in the first modal image as shown in FIG. 4(e), and the The coordinates of four vertices in a grid area are expected coordinates in the second grid area corresponding to the first grid area, and a least squares model is constructed. Solve the least squares model to obtain the expectation of the coordinates of the 4 vertices in the first grid area as shown in Figure 4(f) in the second grid area corresponding to the first grid area coordinate.

Then, according to the expected coordinates of the 4 vertices in the second mesh area, a homography matrix between the first mesh area and the second mesh area corresponding to the first mesh area is obtained, and Perspective transformation is performed on the second grid area, thereby obtaining the target image as shown in FIG. 4(g).

In the embodiment of the present application, for each first grid area pre-divided in the first modal image, at least two first candidate matching points in the first grid area are determined; then, for the first grid area For each first candidate matching point in a grid area, the first target pixel corresponding to the first candidate matching point is searched from the second modal image, wherein the first target pixel corresponds to the corresponding The cross-correlation information between the first candidate matching points conforms to a preset cross-correlation condition. At this time, the similarity between the pixel points can be measured by the cross-correlation information, so as to find the first target pixel point corresponding to the first candidate matching point.

If the first target pixel point corresponding to the first candidate matching point is found from the second modal image, the first candidate matching point and the first matching point corresponding to the first candidate matching point The target pixel is used as a set of matching point pairs between the first modal image and the second modal image; at this time, for each first grid area, the cross-correlation information between the pixels can be used The matching point pair between the first grid area and the second grid area corresponding to the first grid area in the second modal image is obtained, so as to obtain the matching point pair according to the matching point pair. The grid transformation matrix between the first grid region and the second grid region corresponding to the first grid region in the second modal image, and then, according to each grid transformation matrix, the The second modality image is transformed into a target image aligned with respect to the first modality image, thereby realizing image alignment between images of different modality. Wherein, since the matching point pairs required for the image alignment are determined based on the cross-correlation information between the pixels, the interference caused by the difference in the gradient directions of the images of different modalities in the structurally similar regions can be reduced, Therefore, the accuracy of matching point pairs is high, and accordingly, the alignment accuracy of the final target image is also ensured, which avoids that the existing feature point detection and matching based on SIFT and other methods cannot be found between images of different modalities. Accurately match point pairs, so that the image alignment accuracy between images of different modalities is poor.

It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Corresponding to the above-mentioned image alignment method in the above embodiment, FIG. 5 shows a structural block diagram of an image alignment apparatus provided by the embodiment of the present application. For convenience of description, only the part related to the embodiment of the present application is shown.

5, the image alignment device 5 includes:

A determination module 501, configured to determine at least two first candidate matching points in the first grid area for each first grid area pre-divided in the first modal image;

A search module 502, configured to search for the first target pixel point corresponding to the first candidate matching point from the second modal image for each first candidate matching point in the first grid area, wherein, The cross-correlation information between the first target pixel point and the corresponding first candidate matching point meets a preset cross-correlation condition;

The first processing module 503 is configured to, if the first target pixel corresponding to the first candidate matching point is found from the second modal image, compare the first candidate matching point with the first matching point. The first target pixel point corresponding to the candidate matching point is used as a set of matching point pairs between the first modal image and the second modal image;

The second processing module 504 is configured to obtain, according to the matching point pair, the distance between the first grid area and the second grid area corresponding to the first grid area in the second modal image The grid transformation matrix of , wherein the position of the first grid area in the first modal image and the second grid area corresponding to the first grid area in the same position;

The transformation module 505 is configured to transform the second modal image into a target image aligned with respect to the first modal image according to each grid transformation matrix.

Optionally, the determining module 501 is specifically used for:

Optionally, the determining module 501 specifically includes:

a first determining unit, configured to, for each first grid area pre-divided in the first modal image, according to the first gradient value of each first pixel point in the first grid area, and the the second gradient value of each second pixel in the second grid area corresponding to the first grid area, to determine the first grid area and the second grid area corresponding to the first grid area Alignment error between;

a second determining unit, configured to determine the number of the first candidate matching points in the first grid area according to the alignment error;

A third determining unit, configured to determine a first candidate matching point in the first grid area according to the number of the first candidate matching points.

Optionally, the first determining unit specifically includes:

a first processing subunit, configured to, for each first pixel in the first grid area, compare the first gradient value of the first pixel with the second pixel corresponding to the first pixel The absolute value of the difference between the second gradient values of the point is taken as the first absolute value, wherein the position of the first pixel point in the first modal image and the first pixel point corresponding to the first pixel point The positions of the two pixel points in the second modal image are the same;

The second processing subunit is configured to use the sum of the absolute values of the first gradient values in the first grid area as the first summation result, and use the second grid area corresponding to the first grid area The sum of the absolute values of the respective second gradient values in the grid region is used as the second summation result;

A determination subunit, configured to determine the relationship between the first grid area and the Alignment error between a second grid area corresponding to a grid area.

Optionally, the image alignment device 5 further includes:

an acquisition module for acquiring the first original image captured by the first camera and the second original image captured by the second camera;

a correction module, configured to correct the first original image and the second original image respectively according to the pre-calibrated first camera parameters of the first camera and the second camera parameters of the second camera;

a third processing module, configured to take the corrected first original image as the first modality image, and take the corrected second original image as the second modality image, wherein the first modality image and The second modality images are aligned in coplanar rows.

Optionally, the image alignment device 5 further includes:

The second determining module is configured to, for each second grid area pre-divided in the second modal image, determine the grid area in the second grid area according to the gradient information of the second grid area at least two second candidate matching points;

The second search module is configured to, for each second candidate matching point in the second grid area, search for the second target pixel point corresponding to the second candidate matching point from the first modal image, wherein , the cross-correlation information between the second target pixel point and the corresponding second candidate matching point meets a preset cross-correlation condition;

The fourth processing module is configured to, if the second target pixel corresponding to the second candidate matching point is found from the first modal image, compare the second candidate matching point with the second candidate matching point The second target pixel points corresponding to the matching points are used as a set of matching point pairs between the first modal image and the second modal image.

Optionally, the search module 502 specifically includes:

a fourth determining unit, configured to, for each first candidate matching point in the first grid area, determine a region of interest corresponding to the first candidate matching point from the second modal image;

a calculation unit, for calculating the cross-correlation metric value between the pixel and the first candidate matching point for each pixel in the region of interest;

The first processing unit is configured to, if the maximum value of the cross-correlation metric values corresponding to each pixel in the region of interest is greater than a preset threshold, then the maximum value in the region of interest corresponding to the The pixel point is used as the first target pixel point of the first candidate matching point.

Optionally, the cross-correlation metric value is a normalized cross-correlation value;

The computing unit is specifically used for:

Optionally, the second processing module 504 specifically includes:

The construction unit is configured to, according to the matching point pair, the coordinates of the designated vertex in the first mesh area, and the second corresponding to the first mesh area of the designated vertex in the first mesh area the desired coordinates in the grid area to construct a least squares model;

a solving unit, configured to solve the least squares model, and obtain the desired coordinates of the specified vertex in the first grid region in the second grid region corresponding to the first grid region;

The second processing unit is configured to obtain the first mesh area and the first mesh area according to the expected coordinates and the coordinates of the specified vertex in the second mesh area corresponding to the first mesh area a homography matrix between the second grid regions corresponding to the grid regions, and using the homography matrix as the grid transformation matrix;

The transformation module 505 specifically includes:

a transformation unit, configured to, for each second grid region, perform perspective transformation on the second grid region according to the homography matrix corresponding to the second grid region;

The third processing unit is configured to obtain a target image aligned relative to the first modal image according to each perspective transformed second grid area.

It should be noted that the information exchange, execution process and other contents between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For specific functions and technical effects, please refer to the method embodiments section. It is not repeated here.

Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example. Module completion, that is, dividing the internal structure of the above device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated in one processing unit, or each unit may exist physically alone, or two or more units may be integrated in one unit, and the above-mentioned integrated units may adopt hardware. It can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working processes of the units and modules in the above-mentioned system, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

FIG. 6 is a schematic structural diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 6 , the terminal device 6 in this embodiment includes: at least one processor 60 (only one is shown in FIG. 6 ), a memory 61 , and is stored in the above-mentioned memory 61 and can run on the above-mentioned at least one processor 60 The computer program 62, when the processor 60 executes the computer program 62, implements the steps in any of the above image alignment method embodiments.

The above-mentioned terminal device 6 may be a server, a mobile phone, a wearable device, an augmented reality (AR)/virtual reality (VR) device, a desktop computer, a notebook, a desktop computer, a handheld computer and other computing devices. The terminal device may include, but is not limited to, a processor 60 and a memory 61 . Those skilled in the art can understand that FIG. 6 is only an example of the terminal device 6, and does not constitute a limitation on the terminal device 6, and may include more or less components than the one shown, or combine some components, or different components , for example, may also include input devices, output devices, network access devices, and so on. Wherein, the above-mentioned input devices may include keyboards, touchpads, fingerprint collection sensors (for collecting user's fingerprint information and fingerprint direction information), microphones, cameras, etc., and output devices may include displays, speakers, and the like.

The above-mentioned processor 60 can be a central processing unit (Central Processing Unit, CPU), and the processor 60 can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The above-mentioned memory 61 may be an internal storage unit of the above-mentioned terminal device 6 in some embodiments, such as a hard disk or a memory of the terminal device 6 . The above-mentioned memory 61 may also be an external storage device of the above-mentioned terminal device 6 in other embodiments, such as a plug-in hard disk equipped on the above-mentioned terminal device 6, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital) , SD) card, flash memory card (Flash Card), etc. Further, the above-mentioned memory 61 may also include both the internal storage unit of the above-mentioned terminal device 6 and an external storage device. The above-mentioned memory 61 is used to store an operating system, an application program, a boot loader (Boot Loader), data, and other programs, for example, program codes of the above-mentioned computer programs, and the like. The above-mentioned memory 61 can also be used to temporarily store data that has been output or is to be output.

In addition, although not shown, the above-mentioned terminal device 6 may also include a network connection module, such as a Bluetooth module, a Wi-Fi module, a cellular network module, etc., which will not be repeated here.

In this embodiment of the present application, when the processor 60 executes the computer program 62 to implement the steps in any of the above image alignment method embodiments, for each first grid area pre-divided in the first modal image, determine at least two first candidate matching points in the first grid area; then, for each first candidate matching point in the first grid area, searching for the first candidate matching point from the second modal image The first target pixel point corresponding to the candidate matching point, wherein the cross-correlation information between the first target pixel point and the corresponding first candidate matching point meets a preset cross-correlation condition. At this time, the similarity between the pixel points can be measured by the cross-correlation information, so as to find the first target pixel point corresponding to the first candidate matching point.

If the first target pixel point corresponding to the first candidate matching point is found from the second modal image, the first candidate matching point and the first matching point corresponding to the first candidate matching point The target pixel is used as a set of matching point pairs between the first modal image and the second modal image; at this time, for each first grid area, the cross-correlation information between the pixels can be used The matching point pair between the first grid area and the second grid area corresponding to the first grid area in the second modal image is obtained, so as to obtain the matching point pair according to the matching point pair. The grid transformation matrix between the first grid region and the second grid region corresponding to the first grid region in the second modal image, and then, according to each grid transformation matrix, the The second modality image is transformed into a target image aligned relative to the first modality image, thereby realizing image alignment between images of different modality. Wherein, since the matching point pairs required for the image alignment are determined based on the cross-correlation information between the pixels, the interference caused by the difference in the gradient directions of the images of different modalities in the structurally similar regions can be reduced, Therefore, the accuracy of matching point pairs is high, and accordingly, the alignment accuracy of the final target image is also guaranteed, and the existing feature point detection and matching based on SIFT and other methods cannot be found between images of different modalities. Accurately match point pairs, which leads to the problem of poor image alignment accuracy between images of different modalities.

Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be implemented.

The embodiments of the present application provide a computer program product, when the computer program product runs on a terminal device, so that the terminal device can implement the steps in the foregoing method embodiments when executed.

If the above-mentioned integrated units are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on this understanding, the present application realizes all or part of the processes in the methods of the above-mentioned embodiments, which can be completed by instructing the relevant hardware through a computer program. The above-mentioned computer program can be stored in a computer-readable storage medium, and the computer program is in When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the above-mentioned computer program includes computer program code, and the above-mentioned computer program code may be in the form of source code, object code form, executable file or some intermediate form. The above-mentioned computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/terminal device, a recording medium, a computer memory, a read-only memory (ROM, Read-Only Memory), a random access memory ( RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media. For example, U disk, mobile hard disk, disk or CD, etc. In some jurisdictions, under legislation and patent practice, computer readable media may not be electrical carrier signals and telecommunications signals.

In the foregoing embodiments, the description of each embodiment has its own emphasis. For parts that are not described or described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

In the embodiments provided in this application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are only illustrative. For example, the division of the above modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units or Components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that the above-mentioned embodiments can still be used for The recorded technical solutions are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in the present application. within the scope of protection of the application.

Claims

An image alignment method, comprising:

For each first grid area pre-divided in the first modal image, determining at least two first candidate matching points in the first grid area;

For each first candidate matching point in the first grid area, the first target pixel corresponding to the first candidate matching point is searched from the second modal image, wherein the first target pixel The cross-correlation information between the point and the corresponding first candidate matching point meets a preset cross-correlation condition;

If the first target pixel point corresponding to the first candidate matching point is found from the second modal image, the first candidate matching point and the first matching point corresponding to the first candidate matching point The target pixel is used as a set of matching point pairs between the first modal image and the second modal image;

According to the matching point pair, a grid transformation matrix between the first grid area and the second grid area corresponding to the first grid area in the second modal image is obtained, wherein, The position of the first grid area in the first modal image is the same as the position of the second grid area corresponding to the first grid area in the second modal image;

Transforming the second modality image into a target image aligned relative to the first modality image according to the respective grid transformation matrices.
The image alignment method according to claim 1, wherein, for each first grid area pre-divided in the first modal image, at least two first grid areas in the first grid area are determined. A candidate matching point, including:

For each first grid area pre-divided in the first modal image, determine at least two first candidate matches in the first grid area according to the pixel point gradient information of the first grid area point.
The image alignment method according to claim 2, wherein, for each first grid area pre-divided in the first modal image, according to the pixel point gradient information of the first grid area, Determining at least two first candidate matching points in the first grid area includes:

For each first grid area pre-divided in the first modal image, according to the first gradient value of each first pixel point in the first grid area and the corresponding value of the first grid area The second gradient value of each second pixel point in the second grid area, to determine the alignment error between the first grid area and the second grid area corresponding to the first grid area;

determining the number of the first candidate matching points in the first grid area according to the alignment error;

A first candidate matching point in the first grid area is determined according to the number of the first candidate matching points.
The image alignment method according to claim 3, wherein, for each first grid area pre-divided in the first modal image, according to each first pixel in the first grid area The first gradient value of the point, and the second gradient value of each second pixel point in the second grid area corresponding to the first grid area, determine the relationship between the first grid area and the first grid area. The alignment error between the second grid area corresponding to the grid area, including:

For each first pixel point in the first grid area, a first gradient value of the first pixel point and the second gradient value of the second pixel point corresponding to the first pixel point are set between The absolute value of the difference is taken as the first absolute value, wherein the position of the first pixel in the first modal image and the second pixel corresponding to the first pixel are in the second The position in the modal image is the same;

Taking the sum of the absolute values of the first gradient values in the first grid area as the first summation result, and using the second gradient values in the second grid area corresponding to the first grid area The sum of the absolute values of the values is used as the second summation result;

According to each first absolute value in the first grid area, the first summation result and the second summation result, it is determined that the first grid area corresponds to the first grid area The alignment error between the second grid regions.
The image alignment method according to claim 1, wherein, for each first grid area pre-divided in the first modal image, at least two first grid areas in the first grid area are determined. Before candidate matching points, also include:

acquiring a first original image captured by the first camera and a second original image captured by the second camera;

Correcting the first original image and the second original image respectively according to the pre-calibrated first camera parameters of the first camera and the second camera parameters of the second camera;

Taking the corrected first original image as the first modal image, and taking the corrected second original image as the second modal image, wherein the first modal image and the second modal image Coplanar row alignment.
The image alignment method according to claim 1, wherein, according to each set of matching point pairs, the corresponding first grid area and the first grid area in the second modal image are obtained. Before the grid transformation matrix between the second grid regions, also include:

For each second grid area pre-divided in the second modal image, determine at least two second candidate matches in the second grid area according to the gradient information of the second grid area point;

For each second candidate matching point in the second grid area, a second target pixel corresponding to the second candidate matching point is searched from the first modal image, wherein the second target pixel The cross-correlation information between the point and the corresponding second candidate matching point meets a preset cross-correlation condition;

If the second target pixel point corresponding to the second candidate matching point is found from the first modal image, the second candidate matching point and the second matching point corresponding to the second candidate matching point The target pixel points are used as a set of matching point pairs between the first modal image and the second modal image.
The image alignment method according to claim 1, wherein, for each first candidate matching point in the first grid area, the first candidate matching point is searched from the second modal image The corresponding first target pixel, including:

For each first candidate matching point in the first grid area, determining a region of interest corresponding to the first candidate matching point from the second modal image;

For each pixel in the region of interest, calculating a cross-correlation metric value between the pixel and the first candidate matching point;

If the maximum value of the cross-correlation metric values corresponding to each pixel in the region of interest is greater than a preset threshold, the pixel corresponding to the maximum value in the region of interest is taken as the first The first target pixel of the candidate matching point.
The image alignment method according to claim 7, wherein the cross-correlation metric value is a normalized cross-correlation value;

The calculating, for each pixel point in the region of interest, a cross-correlation metric value between the pixel point and the first candidate matching point, including:

For each pixel in the region of interest, the normalized cross-correlation value between the pixel and the first candidate matching point is calculated according to the specified correlation region, where the specified correlation region is the specified correlation region. a designated area centered on the pixel in the second modal image.
The image alignment method according to any one of claims 1 to 8, wherein, according to the matching point pair, the first grid area and the first grid area in the second mode are obtained The grid transformation matrix between the corresponding second grid areas in the image, including:

According to the matching point pair, the coordinates of the designated vertex in the first mesh area, and the coordinate of the designated vertex in the first mesh area in the second mesh area corresponding to the first mesh area Expected coordinates, build a least squares model;

Solving the least squares model to obtain the desired coordinates of the specified vertex in the first grid region in the second grid region corresponding to the first grid region;

According to the expected coordinates and the coordinates of the specified vertex in the second mesh area corresponding to the first mesh area, obtain the second mesh area corresponding to the first mesh area and the first mesh area a homography matrix between grid regions, and using the homography matrix as the grid transformation matrix;

The transforming the second modal image into a target image aligned relative to the first modal image according to each grid transformation matrix includes:

For each second grid region, perform perspective transformation on the second grid region according to the homography matrix corresponding to the second grid region;

A target image aligned with respect to the first modality image is obtained according to each perspective transformed second grid area.
An image alignment device, comprising:

a determining module, configured to determine at least two first candidate matching points in the first grid area for each first grid area pre-divided in the first modal image;

A search module is configured to search, for each first candidate matching point in the first grid area, the first target pixel point corresponding to the first candidate matching point from the second modal image, wherein the The cross-correlation information between the first target pixel point and the corresponding first candidate matching point meets a preset cross-correlation condition;

The first processing module is configured to, if the first target pixel corresponding to the first candidate matching point is found from the second modal image, compare the first candidate matching point with the first candidate matching point The first target pixel point corresponding to the matching point is used as a set of matching point pairs between the first modal image and the second modal image;

a second processing module, configured to obtain, according to the matching point pair, the difference between the first grid area and the second grid area corresponding to the first grid area in the second modal image A grid transformation matrix, wherein the position of the first grid area in the first modal image and the second grid area corresponding to the first grid area are in the second modal image the same location;

A transformation module, configured to transform the second modal image into a target image aligned with respect to the first modal image according to each grid transformation matrix.
A terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements the following steps when executing the computer program:

For each first grid area pre-divided in the first modal image, determining at least two first candidate matching points in the first grid area;

For each first candidate matching point in the first grid area, the first target pixel corresponding to the first candidate matching point is searched from the second modal image, wherein the first target pixel The cross-correlation information between the point and the corresponding first candidate matching point meets a preset cross-correlation condition;

If the first target pixel point corresponding to the first candidate matching point is found from the second modal image, the first candidate matching point and the first matching point corresponding to the first candidate matching point The target pixel is used as a set of matching point pairs between the first modal image and the second modal image;

According to the matching point pair, a grid transformation matrix between the first grid area and the second grid area corresponding to the first grid area in the second modal image is obtained, wherein, The position of the first grid area in the first modal image is the same as the position of the second grid area corresponding to the first grid area in the second modal image;

Transforming the second modality image into a target image aligned relative to the first modality image according to the respective grid transformation matrices.
The terminal device according to claim 11, wherein when the computer program is executed by the processor, the following steps are further implemented:

For each first grid area pre-divided in the first modal image, determine at least two first candidate matches in the first grid area according to the pixel point gradient information of the first grid area point.
The terminal device according to claim 12, wherein when the computer program is executed by the processor, the following steps are further implemented:

For each first grid area pre-divided in the first modal image, according to the first gradient value of each first pixel point in the first grid area and the corresponding value of the first grid area The second gradient value of each second pixel point in the second grid area, to determine the alignment error between the first grid area and the second grid area corresponding to the first grid area;

determining the number of the first candidate matching points in the first grid area according to the alignment error;

A first candidate matching point in the first grid area is determined according to the number of the first candidate matching points.
The terminal device according to claim 13, wherein when the computer program is executed by the processor, the following steps are further implemented:

For each first pixel point in the first grid area, a first gradient value of the first pixel point and the second gradient value of the second pixel point corresponding to the first pixel point are divided between The absolute value of the difference is taken as the first absolute value, wherein the position of the first pixel in the first modal image and the second pixel corresponding to the first pixel are in the second The position in the modal image is the same;

Taking the sum of the absolute values of the first gradient values in the first grid area as the first summation result, and using the second gradient values in the second grid area corresponding to the first grid area The sum of the absolute values of the values is used as the second summation result;

According to each first absolute value in the first grid area, the first summation result and the second summation result, it is determined that the first grid area corresponds to the first grid area The alignment error between the second grid regions.
The terminal device according to claim 11, wherein when the computer program is executed by the processor, the following steps are further implemented:

acquiring a first original image captured by the first camera and a second original image captured by the second camera;

Correcting the first original image and the second original image respectively according to the pre-calibrated first camera parameters of the first camera and the second camera parameters of the second camera;

Taking the corrected first original image as the first modal image, and taking the corrected second original image as the second modal image, wherein the first modal image and the second modal image Coplanar row alignment.
The terminal device according to claim 11, wherein when the computer program is executed by the processor, the following steps are further implemented:

For each second grid area pre-divided in the second modal image, at least two second candidate matches in the second grid area are determined according to gradient information of the second grid area point;

For each second candidate matching point in the second grid area, a second target pixel corresponding to the second candidate matching point is searched from the first modal image, wherein the second target pixel The cross-correlation information between the point and the corresponding second candidate matching point meets a preset cross-correlation condition;

If the second target pixel point corresponding to the second candidate matching point is found from the first modal image, the second candidate matching point and the second matching point corresponding to the second candidate matching point The target pixel points are used as a set of matching point pairs between the first modal image and the second modal image.
The terminal device according to claim 11, wherein when the computer program is executed by the processor, the following steps are further implemented:

For each first candidate matching point in the first grid area, determining a region of interest corresponding to the first candidate matching point from the second modal image;

For each pixel in the region of interest, calculating a cross-correlation metric value between the pixel and the first candidate matching point;

If the maximum value of the cross-correlation metric values corresponding to each pixel in the region of interest is greater than a preset threshold, the pixel corresponding to the maximum value in the region of interest is taken as the first The first target pixel of the candidate matching point.
The terminal device according to claim 17, wherein the cross-correlation metric value is a normalized cross-correlation value;

When the computer program is executed by the processor, the following steps are also implemented:

For each pixel in the region of interest, the normalized cross-correlation value between the pixel and the first candidate matching point is calculated according to the specified correlation region, where the specified correlation region is the specified correlation region. a designated area centered on the pixel in the second modal image.
The terminal device according to claim 11, wherein when the computer program is executed by the processor, the following steps are further implemented:

According to the matching point pair, the coordinates of the designated vertex in the first mesh area, and the coordinate of the designated vertex in the first mesh area in the second mesh area corresponding to the first mesh area Expected coordinates, build a least squares model;

Solving the least squares model to obtain the desired coordinates of the specified vertex in the first grid region in the second grid region corresponding to the first grid region;

According to the expected coordinates and the coordinates of the specified vertex in the second mesh area corresponding to the first mesh area, obtain the second mesh area corresponding to the first mesh area and the first mesh area a homography matrix between grid regions, and using the homography matrix as the grid transformation matrix;

The transforming the second modal image into a target image aligned relative to the first modal image according to each grid transformation matrix includes:

For each second grid region, perform perspective transformation on the second grid region according to the homography matrix corresponding to the second grid region;

A target image aligned with respect to the first modality image is obtained according to each perspective-transformed second grid area.
A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the image alignment method according to any one of claims 1 to 9 is implemented.