CN116468609A

CN116468609A - Super-glue-based two-stage zoom camera multi-image stitching method and system

Info

Publication number: CN116468609A
Application number: CN202310459405.0A
Authority: CN
Inventors: 刘志文; 杨景翔; 程思远; 许根; 吴佳宗; 肖江剑
Original assignee: Ningbo Institute of Material Technology and Engineering of CAS
Current assignee: Ningbo Institute of Material Technology and Engineering of CAS
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2023-07-21

Abstract

The invention discloses a super-glue-based multi-image splicing method and system for a two-stage zoom camera, wherein the method comprises the following steps: step one, collecting images with large angles of view and completing splicing to obtain a template base map; step two, up-sampling and blocking the template base map to generate a template base map block; step three, calculating the position of a shooting point, automatically acquiring clear images, and enabling all acquired clear images to cover a designated area; step four, selecting a clear image and a template bottom block, performing feature matching, calculating a homography matrix and transforming the clear image; step five, calculating the optical flow between the coverage area template base map and the transformed clear image; step six, carrying out pixel-by-pixel registration on the transformed clear image according to the optical flow so as to align the clear image with the template base map; and step seven, converting all the clear images to a coordinate system of the template base map, and fusing the images. The image stitching method has good stitching efficiency and stitching accuracy.

Description

Super-glue-based two-stage zoom camera multi-image stitching method and system

Technical Field

The invention relates to a splicing method and a system for multiple images of a multi-view multi-image, in particular to a method and a system for splicing multiple images of a two-section zoom camera based on SuperGlue and an optical flow field, and belongs to the technical field of image processing.

Background

With the rapid development of computer vision technology, cameras are often used in various fields such as security and traffic monitoring, medical images, industrial detection, and the like. For some large-range scenes, because of the limitation of the view field of a camera, a complete picture is difficult to obtain, and an image stitching method is often adopted to obtain images of the large-range view field. However, the problem that the problems of dislocation, ghost and the like are easily generated due to the fact that the difficulty of image splicing tasks with large visual angle changes and a large number of images is high, and the splicing process is complex. It is significant to study an efficient stitching method for multi-view and multi-image.

The traditional image stitching method generally adopts a Sift or Orb algorithm to extract the characteristics of the images, and then obtains an image transformation relation matrix through characteristic matching calculation to finish the alignment stitching of the images, so that the image stitching method is the most practical and common method; but for some sparse textured scenes, this approach may have difficulty obtaining enough feature points and accurate feature matching; for a multi-plane scene, a homography transformation matrix is difficult to align the whole image, and a more representative solution is a method based on spatial domain transformation, such as an APAP algorithm, the alignment of overlapping areas is performed by dividing grids and a large number of local homography matrixes, but the calculation process is too complex, the parameters to be adjusted are large, and the non-overlapping areas are easy to generate distortion.

In the aspect of multi-image stitching, the traditional method is generally to stitch one by one according to the sequence, the stitching process is complex and accumulated errors are easy to generate, so that the stitching result is obviously distorted or deformed, and is not natural enough.

Disclosure of Invention

In order to solve the problems, the invention provides a high-efficiency splicing method of multiple images based on a variable-focus pan-tilt camera, which is characterized in that the splicing of the images is divided into two steps, the splicing of the images with a large angle of view is firstly completed, the fine splicing of clear images is secondly completed on the basis, the splicing of each image is mutually independent, and the splicing process is simplified. Feature extraction and matching are carried out by adopting SuperPoint and SuperGlue algorithms, the method has robustness and high speed in a sparse texture scene, and unavoidable splicing errors generated by a single homography matrix are reduced by introducing optical flow field registration.

In order to achieve the aim of the invention, the invention adopts the following scheme:

the invention provides a super glue-based two-stage zoom camera multi-image stitching method, which comprises the following steps of:

firstly, reducing the multiplying power of a pan-tilt camera, shooting a plurality of images with large angles of view, and performing image stitching by adopting a feature matching algorithm to generate a template base image;

step two, up-sampling and blocking the template base map to generate a template base map block;

step three, calculating and recording the position of a shooting point by combining a field geometric model of a camera and an area needing shooting and combining a field angle of a tripod head camera, automatically acquiring clear images, and enabling all acquired clear images to cover a designated area;

selecting a clear image, selecting the template base image blocks with approximately the same coverage area, performing feature extraction and matching on the clear image and the template base image blocks by adopting SuperPoint and SuperGlue algorithms, removing mismatching points, restoring pixel point positions of template base image blocks in the matching points to pixel positions in the original template base image, and calculating a homography transformation matrix;

step five, transforming the clear image to a coordinate system of an original template base image by using the homography transformation matrix, obtaining a template base image of a coverage area of the transformed clear image, and calculating an optical flow between the coverage area template base image and the transformed clear image;

step six, carrying out pixel-by-pixel registration on the transformed clear image according to the optical flow so as to align the clear image with the template base map;

and seventhly, repeating the fourth step to the sixth step until all the clear images are transformed to the coordinate system of the template base map, and finally fusing the images.

In one embodiment, the calculating the position of the shooting point in the third step specifically includes:

assume that when the camera is facing downward, the spatial coordinates of the four corners of the camera field of view region on the imaging plane at a distance h are respectively Wherein α represents a horizontal angle of view, and β represents a vertical angle of view; when the cloud platform rotates left and right or up and down by an angle theta, the points A, B, C and D are uniformly expressed as v ₀ ，v′ ₀ ＝T*R(z)*R(x)*v ₀ Representing the coordinates after rotation translation, v 'is present' ₀ ＝T*R(z)*R(x)*v ₀ Wherein R (z) and R (x) respectively represent rotation matrices around the z axis and the x axis, and then v 'is calculated' ₀ ＝T*R(z)*R(x)*v ₀ The intersection point with the ground is the coordinates of four intersection points of the camera view field range.

In one embodiment, the optical flow calculation in the fifth step is as follows:

i (x, y, z) =i (x+dx, y+dy, t+dt) represents the distance that a pixel point moves from one frame to the next, taking the time of dt; by Taylor expansion

Further obtain

Is provided withRepresenting velocity vectors of the optical flow along the x and y axes, respectively, so that

I _x u+I _y v+I _t ＝0

Wherein I is _x ，I _y ，I _t The partial derivatives of the gray values of the pixel points of the image respectively expressed in the X, Y and T directions can be obtained through the image data, and (u, v) is the optical flow loss of the optical flow along the X axis and the Y axis respectively.

In one embodiment, in the sixth step, the pixel-by-pixel registration is performed on the transformed clear image according to the optical flow, so that the process of aligning the template base map uses the following calculation formula:

F(x+u，y+v)＝P(x，y)；

wherein P is a clear image coordinate pixel value before aligning the template base map, F is a clear image pixel value after aligning the template base map, X and Y represent pixel coordinates, and u and v represent optical flow magnitudes in the X and Y directions of the pixel positions.

In one embodiment, the fusing of the images in the seventh step specifically adopts the following calculation formula:

P(x，y)＝w1*P1(x，y)+w2*P2(x，y)

wherein P represents the pixel value of the fused image, P1 and P2 respectively represent the pixel values of the overlapping areas of two mutually overlapped clear images, and w1 and w2 represent the weights of the pixel values of the two images.

Another aspect of the present invention provides a SuperGlue-based two-stage zoom camera multi-image stitching system, wherein the image stitching system comprises:

the image acquisition and template base image generation module is used for reducing the multiplying power of the cradle head camera, shooting a plurality of large-angle-of-view images, and performing image stitching by adopting a feature matching algorithm to generate a template base image;

the template base map block generation module is used for upsampling and blocking the template base map to generate a template base map block;

the clear image acquisition module is used for calculating and recording the position of a shooting point by combining a view field geometric model of a camera and a region needing to be shot and combining a view field angle of a cradle head camera, automatically acquiring clear images and enabling all acquired clear images to cover a designated region;

the transformation matrix calculation module is used for selecting a clear image, selecting the template base image blocks with approximately the same coverage area, carrying out feature extraction and matching on the clear image and the template base image blocks by adopting SuperPoint and SuperGlue algorithms, removing mismatching points, restoring the pixel point positions of the template base image blocks in the matching points to the pixel positions in the original template base image, and calculating a homography transformation matrix;

the optical flow calculation module is used for transforming the clear image to a coordinate system of an original template base image by using the homography transformation matrix, obtaining a template base image of a coverage area of the transformed clear image, and calculating the optical flow between the coverage area template base image and the transformed clear image;

the pixel matching module is used for carrying out pixel-by-pixel registration on the transformed clear image according to the optical flow so as to align the clear image with the template base map;

and the image fusion module is used for fusing the images after all the clear images are transformed to the coordinate system of the template base map.

In one embodiment, the clear image acquisition module calculates a position of a shooting point, and specifically includes:

In one embodiment, the optical flow calculation module calculates the optical flow as follows:

Further obtain

I _x u+I _y v+I _t ＝0

In one embodiment, the pixel matching module performs pixel-by-pixel registration on the transformed clear image according to the optical flow, so that the process of aligning the transformed clear image with the template base map adopts the following calculation formula:

F(x+u，y+v)＝P(x，y)；

In one embodiment, the image module performs graphic fusion specifically using the following calculation formula:

P(x，y)＝w1*P1(x，y)+w2*P2(x，y)

Compared with the prior art, the invention has at least the following advantages: (1) Splitting the image splicing into two steps, firstly completing the image splicing with a large field angle, then completing the fine splicing of the clear images on the basis, mutually independent splicing of each image, and simplifying the splicing process; (2) The feature extraction and matching are carried out by adopting SuperPoint and SuperGlue algorithms, so that the method has robustness and higher speed in a sparse texture scene; (3) The introduction of optical flow field registration reduces the unavoidable splice errors generated by striking a single homography matrix.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a stitching method in an embodiment of the present invention;

FIG. 2 is a view field geometric model of pan/tilt head shooting in an embodiment of the present invention;

FIG. 3 is an example of a template base map in an embodiment of the invention;

FIG. 4 is a feature matching ratio versus graph of the Sift algorithm and the SuperGlue algorithm;

FIG. 5a is a clear illustration of an embodiment of the present invention, and FIG. 5b is a schematic diagram of a splicing method according to an embodiment of the present invention;

fig. 6a and 6b are diagrams of partial alignment before and after optical flow registration, respectively, in an embodiment of the present invention.

Detailed Description

The inventor proposes the technical scheme of the invention to overcome the defects or problems of the prior nerve radiation field technology. The invention relates to a three-dimensional reconstruction method of a nerve radiation field based on depth dynamic sampling, which is comparable with some most advanced nerve radiation field reconstruction methods in performance and has good visual effect.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the embodiments of the present invention will be given with reference to the accompanying drawings. Examples of these preferred embodiments are illustrated in the accompanying drawings. The embodiments of the invention shown in the drawings and described in accordance with the drawings are merely exemplary and the invention is not limited to these embodiments.

It should be noted here that, in order to avoid obscuring the present invention due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present invention are shown in the drawings, while other details not greatly related to the present invention are omitted.

In a more typical embodiment of the present application, the method comprises the steps of:

1) In order to realize accurate splicing of a certain number of multi-view images shot by the pan-tilt camera, the magnification of the pan-tilt camera is firstly reduced, the angle of view is larger at the moment, and a plurality of low-definition images covering a specified range are shot. The pan-tilt camera magnification is an attribute of a zoom camera, and the lower the camera magnification is, the larger the field angle is. Because the angle transformation of the large-field-angle image is small, the features are rich, and the homography transformation matrix is calculated by adopting the traditional feature matching method to splice, so that the template base map is obtained.

2) And (3) upsampling the template base map obtained in the step (1), and dividing the template base map into 2 x 2 or 4*4 blocks to obtain a template base block.

3) And acquiring a clear image, namely calculating and recording the position of a shooting point by combining a field geometric model of a camera and an area (as shown in the figure) needing to be shot and the field angle of a tripod head camera, wherein the calculation process is as follows:

let the spatial coordinates of the four corners of the camera field of view area on the imaging plane at a distance h be a (h x Wherein α represents a horizontal angle of view, and β represents a vertical angle of view; when the cloud platform rotates left and right or up and down by an angle theta, the points A, B, C and D are uniformly expressed as v ₀ ，v′ ₀ ＝T*R(z)*R(x)*v ₀ Representing the coordinates after rotation translation, v 'is present' ₀ ＝T*R(z)*R(x)*v ₀ Wherein R (z) and R (x) respectively represent rotation matrices around the z axis and the x axis, and then v 'is calculated' ₀ ＝T*R(z)*R(x)*v ₀ The intersection point with the ground is the coordinates of four intersection points of the camera view field range.

Therefore, the preset points in the field of view range of the camera can be obtained through the height, multiplying power and rotation angle of the cradle head camera to comprise the three groups of data, and the preset points are set to enable all the shot images to cover a designated area, so that the acquisition of clear images is automatically completed.

4) Selecting a clear image, selecting a template base map block with approximately the same coverage area, and carrying out feature extraction and matching on the clear image and the template base map block by adopting SuperPoint and SuperGlue algorithms. After the error matching points are removed by adopting a random consistency method, the pixel point positions of the bottom image blocks in the matching points are restored to the pixel positions in the original template bottom image, and a homography transformation matrix is calculated. The homography transformation matrix represents an image linear geometric transformation matrix with eight degrees of freedom, a linear equation is established through coordinates of a plurality of pairs of matching points of two images, and a least square method is adopted to fit a best result.

5) Transforming the clear image by using the homography transformation matrix calculated in the step 4), establishing a template base image pixel coordinate system by taking the upper left corner of the template base image as an origin, transforming the clear image into the coordinate system of the template base image by performing operations such as rotary scaling, translation, miscut, mirroring and the like, obtaining a template base image of a coverage area of the transformed clear image, obtaining a dense optical flow between the transformed clear image and the template base image, wherein the optical flow refers to projection of a space three-dimensional motion field on the image, represents the motion size and direction of an image pixel at a certain moment, and comprises the following calculation processes:

Further obtain

I _x u+I _y v+I _t ＝0

Wherein I is _x ，I _y ，I _t The gray values of the pixel points of the image respectively expressed are X, Y and TThe direction deviation can be obtained from the image data, and (u, v) is the optical flow loss of the optical flow along the X axis and the Y axis respectively.

6) And 5) carrying out pixel-by-pixel registration on the converted clear image through the optical flow obtained in the step 5), and aligning the clear image with the template base map according to the following formula.

F(x+u，y+v)＝P(x，y)；

Wherein P is a clear image coordinate pixel value before aligning the template base map, F is a clear image pixel value after aligning the template base map, X and Y represent pixel coordinates, and u and v represent optical flow magnitudes in the X and Y directions of the pixel position.

7) Repeating the steps 4), 5) and 6) until all the clear images are transformed to the coordinate system of the template base map, and finally fusing the images by adopting a weighted average method, wherein the formula is as follows:

P(x，y)＝w1*P1(x，y)+w2*P2(x，y)

where P represents the pixel value of the fused image, P1 and P2 represent the pixel values of the overlapping areas of two clear images overlapping each other, w1, w2 represent the weights of the pixel values of the two images, and the simplest method is to set the values to 0.5 and 0.5, and other suitable values can also be set.

Specifically, as shown in fig. 1, the method for efficiently splicing multiple images of a two-stage zoom camera based on SuperGlue has the following specific implementation process:

1) As shown in fig. 2, images with different definition and positions can be shot by collecting images from top to bottom through the pan-tilt camera and adjusting the magnification and the pitching and yawing angles of the pan-tilt.

2) 2-3 large-angle-of-view low-definition images covering a specified range are shot by adjusting the multiplying power of a pan-tilt camera, a template base image is obtained by splicing by adopting a traditional method of sift+cann, up-sampling and blocking are carried out, and the template base image obtained by splicing 2 large-angle-of-view images is shown in fig. 3; and then calculating shooting preset positions according to the required definition, automatically acquiring clear images, and enabling all the clear images to cover a specified range.

3) Selecting a clear image and a template base map block, performing feature extraction and matching by adopting a SuperPonit, superGlue algorithm, removing mismatching points by adopting a random consistency method, and obtaining accurate matching point pairs, wherein a large number of mismatching exists in the traditional Sift method as shown in fig. 4.

4) And restoring the pixel coordinates of the bottom image block of the middle template in the matching point pair to the position of the original template bottom image, and calculating to obtain a homography transformation matrix, wherein as shown in fig. 5a and 5b, the clear image is transformed onto the coordinate system of the original template bottom image according to the homography transformation matrix.

5) And generating a mask through the coordinate positions of the four vertexes of the clear image after transformation, thereby obtaining a template base image of the coverage area of the transformed clear image, and calculating the optical flow between the template base image and the transformed clear image.

6) The calculated optical flow contains the position offset of each pixel in the x and y directions, and the transformed clear images are registered pixel by pixel according to the offset, so that the transformed clear images are aligned with the template base map, and as shown in fig. 6a and 6b, the local alignment condition before and after optical flow registration is adopted, the alignment between the re-projected images can be accurate after optical flow registration is adopted.

It should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate technical solution, and this description is for clarity only, and those skilled in the art should consider the disclosure as a whole, and the technical solutions of the embodiments may be combined appropriately to form other embodiments that can be understood by those skilled in the art.

Claims

1. The super glue-based multi-image stitching method for the two-stage zoom camera is characterized by comprising the following steps of:

2. The image stitching method according to claim 1, wherein the calculating the position of the shooting point in the third step specifically includes:

assume that when the camera is facing downward, the spatial coordinates of the four corners of the camera field of view region on the imaging plane at a distance h are respectively Wherein α represents a horizontal angle of view, and β represents a vertical angle of view; when the cloud platform rotates left and right or up and down by an angle theta, the points A, B, C and D are uniformly expressed as v ₀ ，v′ ₀ ＝T*R(z)*R(x)*v ₀ Representing rotational translationThe latter coordinates then have v' ₀ ＝T*R(z)*R(x)*v ₀ Wherein R (z) and R (x) respectively represent rotation matrices around the z axis and the x axis, and then v 'is calculated' ₀ ＝T*R(z)*R(x)*v ₀ The intersection point with the ground is the coordinates of four intersection points of the camera view field range.

3. The image stitching method according to claim 1, wherein the optical flow calculation in the fifth step is as follows:

Further obtain

I _x u+I _y v+I _t ＝0

4. The image stitching method according to claim 3, wherein in the sixth step, the transformed clear image is registered pixel by pixel according to the optical flow, so that the process of aligning the template base map uses the following calculation formula:

F(x+u，y+v)＝P(x，y)；

5. The image stitching method according to claim 1, wherein the fusing of the images in the seventh step specifically adopts the following calculation formula:

P(x，y)＝w1*P1(x，y)+w2*P2(x，y)

6. A SuperGlue-based two-stage zoom camera multi-image stitching system, the image stitching system comprising:

7. The image stitching system of claim 6 wherein the sharp image acquisition module calculates the location of the capture point, comprising:

8. The image stitching system of claim 6 wherein the optical flow calculation module calculates optical flow as follows:

Further obtain

I _x u+I _y v+I _t ＝0

9. The image stitching system of claim 8 wherein the pixel matching module performs pixel-by-pixel registration of the transformed sharp image based on the optical flow to align the template base map using the following calculation formula:

F(x+u，y+v)＝P(x，y)；

10. The image stitching system of claim 6 wherein the image module performs a graphic fusion using the following calculation formula:

P(x，y)＝w1*P1(x，y)+w2*P2(x，y)