CN118071655A

CN118071655A - Fisheye image correction and stitching method and system

Info

Publication number: CN118071655A
Application number: CN202410464709.0A
Authority: CN
Inventors: 刘寒松; 王国强; 王永; 刘瑞; 李越; 李贤超
Original assignee: Sonli Holdings Group Co Ltd
Current assignee: Sonli Holdings Group Co Ltd
Priority date: 2024-04-18
Filing date: 2024-04-18
Publication date: 2024-05-24

Abstract

The invention belongs to the technical field of image processing, and relates to a fisheye image correction and splicing method and a fisheye image correction and splicing system, wherein fisheye images shot by a fisheye lens are collected first, and the fisheye images are divided into 2D image blocks; then, a self-supervision pre-training module is adopted to conduct self-supervision pre-training, fine granularity distortion characterization of the fisheye image is extracted, and a full-scale mapping flow is learned through a fisheye image correction module, so that a corrected fisheye image pair is obtained; and then, the corrected fisheye image pair is divided into a target image and a reference image, and finally, the mapped target image and the reference image are fed into a SIAMESEMAE network for mixing to generate a final panoramic stitching image, so that the panoramic stitching image can adapt to different fisheye image stitching scenes, including different factors such as illumination conditions and backgrounds, the universality of the method is improved, and the method is better represented in diversified application environments.

Description

Fisheye image correction and stitching method and system

Technical Field

The invention belongs to the technical field of image processing, and relates to a fisheye image correction and stitching method and system.

Background

The fish-eye lens is a wide-angle camera, and the lens design enables the fish-eye lens to capture a wider visual field area, so that the problem of blind areas in the traditional camera is solved, and the monitoring comprehensiveness is improved. Therefore, fewer cameras are needed during installation, and the installation and maintenance costs are reduced to a great extent. With the increasing of urban traffic pressure, vehicles are rapidly increased, and the use of fish-eye cameras at traffic intersections, highways, parking lots and other places can improve the efficiency of traffic supervision, help to reduce traffic accidents and improve traffic smoothness. Therefore, the wide field of view of the fish-eye lens makes the fish-eye lens an ideal choice in a monitoring system, provides a more flexible and comprehensive solution for monitoring in various fields, and provides reliable support for the safety and development of society.

Although fish-eye lenses have unique advantages in providing an extremely wide viewing angle, their special viewing angle is such that the image they capture exhibits the characteristics of a spherical projection, and such distortion not only affects the appearance of the image, but can also lead to misunderstanding of the size and shape of the object. Thus, fisheye image correction and stitching is a critical technique to overcome these potential vision problems. The distortion can be eliminated or minimized through fisheye image correction, so that the image is more in line with the actual scene, the image quality and accuracy are effectively improved, and meanwhile, the splicing technology allows a plurality of fisheye images to be seamlessly fused ingeniously, and a continuous and complete panoramic image is created. By stitching, a user can obtain a more comprehensive and complete view without being limited by the view angle of a single camera. This plays an important role in monitoring systems, virtual navigation, data acquisition under specific circumstances, and the like.

In summary, in the background of fisheye lens application, fisheye image correction and stitching are a concern. In order to improve the quality and accuracy of fish-eye images, researchers are struggling to explore and develop methods such as fish-eye image correction technology, fish-eye image stitching technology, deep learning algorithm and the like so as to provide more effective technical means for monitoring in various fields.

Disclosure of Invention

In order to solve the problem that the fisheye lens is distorted and deformed by images, the invention provides a fisheye image correction splicing method and a fisheye image correction splicing system, an effective method is introduced for fine-granularity distortion characterization in encoded fisheye images based on self-supervision representation learning to finely adjust a corrected fisheye image network, then a pixel torsion-based method is used for processing the problem of large parallax, and corrected fisheye images are spliced to obtain seamless and coherent panoramic images, so that a more comprehensive and complete field of view is obtained.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

In a first aspect, the invention provides a fisheye image correction stitching method, which comprises the following steps:

S1, constructing a fisheye image dataset: collecting a fisheye image shot by a fisheye lens, constructing a fisheye image data set, and dividing the fisheye image into 2D image blocks;

S2, self-supervision pre-training: self-supervision pre-training is carried out by adopting a self-supervision pre-training module, and fine granularity distortion characterization of the fisheye image is extracted;

s3, correcting the fisheye image: using the extracted fine-granularity distortion characterization, and learning a full-scale mapping flow through a fisheye image correction module to obtain corrected fisheye image pairs;

S4, mapping the pixel-level image: dividing the corrected fisheye image into a target image and a reference image, repositioning pixels of the target image onto a reference image plane by using a mapping vector, and directly estimating a two-dimensional mapping vector for an overlapping region in the target image; regularizing the loss of pixels in the non-overlapping area to obtain a mapped target image;

s5, splicing fish-eye images: the mapped target image is mixed with a reference image feed to SIAMESEMAE network to generate the final panoramic stitched image.

As a further technical scheme of the invention, the fisheye image in the step S1 isDividing a 2D image block of the fisheye image into/>Wherein H and W are image/>S is the size of the image block,/>Representing the number of 2D image blocks.

As a further technical scheme of the invention, the specific process of the step S2 is as follows:

s21, mapping the 2D image block draw to the D dimension by using linear projection to obtain an image block embedded ；

S22, designing a specific distortion position diagram for the fish-eye imageEach value in the distortion location map represents the degree of distortion of the corresponding image block, the value in the distortion location map being obtained from the image block radius;

S23, embedding the image block After random scrambling in the first dimension, embedding the position embedded in the scrambled image block to obtain a new image block embedding/>New image blocks are then embedded/>Input into vision transformer network to obtain abstract image block representation/>；

S24, the distortion position diagram is subjected to the same way as the step S23Disturbing and remodelling to obtain distortion degree label/>Distortion degree label/>Corresponds to abstract tile representation/>；

S25, taking image blocks with the same distortion degree as positive examples, taking image blocks with different distortion degrees as negative examples, and coding local unique distortion in different abstract image block representations by using contrast learning, wherein a loss function in the contrast learning process is defined as follows:

，

wherein, Representative/>I-th abstract image block representation,/>Is the positive index set of the ith abstract image block,/>An evaluation index function indicating whether or not to calculate, total contrast loss/>At/>Calculated on each abstract image block representation,/>；

S26, performing pretraining end-to-end optimization on the self-supervision pretraining module by using the fisheye image dataset, wherein the training target isAnd extracting fine granularity distortion characterization of the fisheye image through a pre-trained self-supervision pre-training module.

As a further technical scheme of the invention, the specific process of the step S3 is as follows:

S31, embedding the fisheye image into the image block obtained in the same process as the step S21 Then directly inputting the abstract image block representation into vision transformer networks to obtain abstract image block representation/>；

S32, representing the abstract image blockAdjusting the shape to/>The weights of the adjacent pixels are estimated for up-sampling, and the specific flow is as follows:

S321, generating a scale by using two convolution layers Mapping stream/>Wherein the first dimension represents the offset of the image in the x-direction, the second dimension represents the offset of the image in the y-direction, and 2 represents both the x-direction and the y-direction;

S322, predicting dimensionality as using two convolution layers And pair/>/>, Of each pixel of (1)The weights of the neighborhood perform a normalization (softmax) operation, the obtained dimension beingIs a diagram of (1);

s323, the dimension obtained by the steps is Is arranged and aligned to be the full scale map/>；

S324, obtaining corrected image by bilinear sampling：

，

Wherein,Representing corrected image/>Pixel coordinates,/>Representing fish eye imagesIn the pixel coordinates of the prediction.

As a further technical solution of the present invention, the training objective of the fisheye image correction module in step S3 is: wherein/> Is a predicted full-scale mapping flow,/>Is the actual mapping flow given,Representing the true effective foreground region of the rectified image,/>Representing the L1 distance between the predicted full-scale map and a given real map.

As a further technical scheme of the invention, the specific process of the step S4 is as follows:

S41, selecting a pair of corrected fish-eye images, and defining a target image as The reference image is/>First, a feature encoder/>, is usedImage/>, objectAnd reference image/>Dense feature maps, each mapped to a lower resolution, yield image features/>And/>Where d=256, applying one and/>Contextual networks of identical structure/>From the target image/>Extracting contextual features in a computer system；

S42, for image characteristicsAnd/>Forming a visual correlation quantity/>, by taking the dot product between all pairs of feature vectorsThe specific calculation is as follows:

，

Wherein i, j, k, l, h means that i, j are image features The spatial position of the features in (1), k, l is/>The spatial position of the features in (c), h is the component index in the feature vector,Is the image feature/>The value of the h-th channel of the medium feature tensor at position (i, j)/>Is the image feature/>The value of the h-th channel of the medium feature tensor at position (k, l);

S43, slave object image Contextual characteristics/>In using the visual correlation quantity/>Indexing to generate related feature map m/>；

S44, giving the current mapping estimationRelated feature map m/>Each pixel/>Mapping to/>Estimated position/>；

S45, directly estimating a two-dimensional mapping vector for an overlapping area in the target image; regularizing the loss of pixels in non-overlapping regions, defining the overlapping regions asThe non-overlapping region is/>Given a true pixel level map/>The loss functions of the supervised predictive and real maps are defined as:

，

pixel level mapping using prediction Repositioning image/>To obtain a mapped target image/>。

As a further aspect of the present invention, the feature encoder in step S41 is composed of 6 residual blocks, wherein 2 residual blocks areResolution, 2 residual blocks are/>Resolution, 2 residual blocks are/>Resolution.

As a further technical scheme of the invention, the panoramic stitching image obtained in the step S5 is thatWherein/>For SIAMESEMAE networks, the SIAMESEMAE network adopts a network based on a transducer architecture, two images are firstly converted into a series of non-overlapped image blocks, then the linear projection mapping is used for embedding the image blocks, position embedding information is added, a [ cls ] mark is added, and the network is encouraged to learn the correlation of the two images by establishing asymmetric masking, namely, unshielding/>And mask/>90% Of the image blocks in the image are then encoded using a weight-shared transducer encoder, the processed image blocks will/>The encoded image is embedded as key and value,/>The coded image is embedded as a query and fed into a transducer decoder to obtain a panoramic mosaic image/>。

In a second aspect, the present invention provides a fisheye image correction stitching system, comprising:

The fisheye image data set construction module is used for collecting fisheye images shot by the fisheye lens and dividing the fisheye images into 2D image blocks;

the self-supervision pre-training module is used for learning fine-granularity distortion characteristic representation of the fisheye image;

the fisheye image correction module is used for correcting the fisheye image pair by using the fine-granularity distortion characteristic representation;

The pixel-level mapping module is used for dividing the fisheye image obtained by the fisheye image correction module into a target image and a reference image, and then repositioning each pixel in the target image to obtain a mapped target image;

and the fish-eye stitching image generation module is used for mixing the mapped target image with the reference image to generate a final panoramic stitching image.

In a third aspect, the invention provides an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of the first aspect.

In a fourth aspect, the present invention provides a computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of the first aspect.

Compared with the prior art, the invention has the beneficial effects that:

The invention provides a fish-eye image correction and splicing method and system based on SIAMESEMAE networks, which aim to solve the problem that a fish-eye lens is distorted and deformed by images, thereby improving the quality and accuracy of the fish-eye image, and the innovation is mainly characterized in three aspects: a self-supervision distortion perception method based on the distortion characteristics of fish-eye images, a pixel-level-based fish-eye image mapping method and a SIAMESEMAE network-based fish-eye splicing method have the following advantages:

(1) The self-supervision method does not need external calibration data, and is more flexible in coping with the distortion problems of different fisheye images based on the fisheye image distortion characteristics.

(2) The pixel-level fisheye image mapping method processes the fisheye image pair at the pixel level, is beneficial to maintaining the detail and accuracy of the image and reducing the image distortion.

(3) The fish-eye splicing method based on SIAMESEMAE networks can adapt to different fish-eye image splicing scenes, including different illumination conditions, backgrounds and other factors, and improves the universality of the method, so that the method can better perform in diversified application environments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the present disclosure and do not constitute a limitation on the invention.

Fig. 1 is a schematic flow chart of a fisheye image correction and stitching method provided by the invention.

Fig. 2 is a diagram of a SIAMESEMAE network structure provided in the present invention.

Fig. 3 is a network structure block diagram of a fisheye image correction splicing system provided by the invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.

Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

Example 1:

as shown in fig. 1, the embodiment provides a fisheye image correction and stitching method, which includes the following steps:

s1, constructing a fisheye image dataset: collecting fish-eye images shot by a fish-eye lens Constructing a fisheye image dataset and dividing the fisheye image into 2D image blocks/>Wherein H and W are image/>S is the size of the image block,/>Representing the number of 2D image blocks.

S2, self-supervision pre-training: processing a 2D image to obtain an image block, embedding the image block, dividing a fisheye image into a plurality of image blocks, designing a specific distortion position diagram by utilizing inherent distortion characteristics of the fisheye image to mark, embedding the image blocks into abstract representation, reducing the abstract representation range with the same distortion mode by using contrast learning, and expanding the characteristic distance with other abstract representation at the same time so as to distinguish different local distortion modes, and extracting fine-granularity distortion characterization, wherein the specific steps are as follows:

S21, flattening the 2D image block and mapping the 2D image block to the D dimension by using linear projection to obtain an embedded image block ；

S22, designing a specific distortion position diagram for the fish-eye imageEach value in the distortion location map represents the degree of distortion of the corresponding image block, the value in the distortion location map being obtained from the image block radius; for clarity, the following matrix provides one/>Size distortion location map example:

，

S23, after obtaining the image block embedding and distortion position diagram, embedding the image block to avoid simple mapping between the position embedding and the specific civic mode After random scrambling in the first dimension, embedding the position embedded in the scrambled image block to obtain a new image block embedding/>New image blocks are then embedded/>Input into vision transformer network to obtain abstract image block representation/>；

S25, observing the fisheye image can find that the distortion of the fisheye image is radially distributed, and the distortion degree increases along with the increase of the distance from the center of the image, so that image blocks with the same distortion degree are regarded as positive examples, image blocks with different distortion degrees are regarded as negative examples, the local unique distortion in different abstract image block representations is encoded by using contrast learning, and a loss function in the contrast learning process is defined as follows:

，

wherein, Representative/>I-th abstract image block representation,/>Is the positive index set of the ith abstract image block,/>An evaluation index function indicating whether or not to calculate, total contrast loss/>At the position ofCalculated on each abstract image block representation,/>；

S3, correcting the fisheye image: after self-supervision pre-training, the network can provide better fine-grained distortion characterization for the fisheye image, in the following fisheye image correction module, the pre-trained weight is used for initializing, namely the pre-trained weight is used for initializing the network, the network model benefits from the pre-trained model, the pre-trained weight is trained on a large data set, and therefore the network directly contains rich fine-grained distortion characterization information, the step can help the model to converge more quickly, the training time is reduced, and the network model can be better generalized to new data under the condition of data scarcity, and the method is specifically as follows:

S324, obtaining corrected image by bilinear sampling：

，

Wherein,Representing corrected image/>Pixel coordinates,/>Representing fish eye imagesIn step S3, the training target of the fisheye image correction module is: wherein/> Is a predicted full-scale mapping flow,/>Is the actual mapping flow given,Representing the true effective foreground region of the rectified image,/>Representing the L1 distance between the predicted full-scale map and a given real map.

S4, mapping at a pixel level: after the fisheye image is corrected, two fisheye images from different views of the same scene are corrected, the two images inevitably have overlapping areas, and because the two images are from the same scene, if the conventional affine matrix is used for image stitching, the quality of the stitching result is reduced due to the ghost effect, so that aiming at the characteristics of fisheye image stitching, we do not use a shared transformation function, but directly estimate the two-dimensional mapping vector of each pixel, and a deformed target image is obtained, specifically:

S41, selecting a pair of corrected fish-eye images, and defining a target image as The reference image is/>First, a feature encoder/>, is usedImage/>, objectAnd reference image/>Dense feature maps, each mapped to a lower resolution, yield image features/>And/>Where d=256, applying one and/>Contextual networks of identical structure/>From the target image/>Extracting contextual features in a computer system; The feature encoder consists of 6 residual blocks, 2 of which are/>Resolution, 2 residual blocks are/>Resolution, 2 residual blocks are/>Resolution ratio;

S42, for image characteristics And/>Forming a visual correlation quantity/>, by taking the dot product between all pairs of feature vectorsThe specific calculation is as follows:

，

S44, giving the current mapping estimationEach pixel/>, in the relevant feature map mMapping to/>Estimated position/>；

S45, directly estimating a two-dimensional mapping vector for an overlapping area in the target image; because of the lack of correspondence, mapping the pixel direction of the non-overlapping region is an ambiguous problem, regularizing the loss of the non-overlapping region to consider the discomfort of mapping the non-overlapping region, defining the overlapping region asThe non-overlapping region is/>Given a true pixel level map/>The loss functions of the supervised predictive and real maps are defined as:

，

S5, generating a fisheye spliced image: target image for remappingAnd reference image/>Using SIAMESEMAE network (denoted/>) Generating a splice result/>The following is indicated: /(I)Specifically, SIAMESEMAE network is based on a transducer architecture, as shown in fig. 2, after converting two images into a series of non-overlapping image blocks, linear projection mapping is used to embed the image blocks, then position embedding information is added, and a [ cls ] mark is added, so that the network is encouraged to learn the correlation of the two images by establishing asymmetric masking, namely, unmasking/>And mask/>90% Of the image blocks in the image are then encoded using a weight-shared transducer encoder, the processed image blocks will/>The encoded image is embedded as key and value,/>The coded image is embedded as a query and fed into a transducer decoder to obtain a panoramic mosaic image/>。

Example 2:

the embodiment provides a fisheye image correction stitching system, as shown in fig. 3, including:

and the pixel-level mapping module is used for dividing the fisheye image obtained by the fisheye image correction module into a target image and a reference image, and then repositioning each pixel in the target image to obtain a mapped target image.

Here, it should be noted that the above-mentioned modules correspond to steps S1 to S5 in embodiment 1, and the above-mentioned modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 1. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.

In further embodiments, there is also provided:

An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method described in embodiment 1. For brevity, the description is omitted here.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.

A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method described in embodiment 1.

The method in embodiment 1 may be directly embodied as a hardware processor executing or executed with a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.

Those of ordinary skill in the art will appreciate that the elements of the various examples described in connection with the present embodiments, i.e., the algorithm steps, can be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. The fisheye image correction and splicing method is characterized by comprising the following steps of:

2. The method for correcting and stitching fish-eye images according to claim 1, wherein the fish-eye images in step S1 areDividing a 2D image block of the fisheye image into/>Wherein H and W are image/>S is the size of the image block,/>Representing the number of 2D image blocks.

3. The fisheye image correction and stitching method according to claim 2, wherein the specific process of step S2 is as follows:

S24, the distortion position diagram is subjected to the same way as the step S23Disturbing and remodelling to obtain distortion degree marksDistortion degree label/>Corresponds to abstract tile representation/>；

，

4. The fisheye image correction and stitching method according to claim 3, wherein the specific process of step S3 is as follows:

S322, predicting dimensionality as using two convolution layers And pair/>/>, Of each pixel of (1)The weight of the neighborhood performs normalization operation, and the obtained dimension is/>Is a diagram of (1);

S324, obtaining corrected image by bilinear sampling：

，

Wherein,Representing corrected image/>Pixel coordinates,/>Representing fish eye image/>In the pixel coordinates of the prediction.

5. The fisheye image correction stitching method according to claim 4, wherein the training objective of the fisheye image correction module in step S3 is: wherein/> Is a predicted full-scale mapping flow,/>Is given as true mapping flow,/>Representing the true effective foreground region of the rectified image,/>Representing the L1 distance between the predicted full-scale map and a given real map.

6. The fisheye image correction and stitching method according to claim 5, wherein the specific process of step S4 is:

，

Wherein i, j, k, l, h means that i, j are image features The spatial position of the features in (1), k, l is/>The spatial position of the features in (1), h is the component index in the feature vector,/>Is the image feature/>The value of the h-th channel of the medium feature tensor at position (i, j),Is the image feature/>The value of the h-th channel of the medium feature tensor at position (k, l);

S44, giving the current mapping estimationRelated feature map m/>Each pixel of (3)Mapping to/>Estimated position/>；

，

pixel level mapping using prediction Repositioning image/>To obtain a mapped target image。

7. The fisheye image correction and stitching method according to claim 6, wherein the panoramic stitching image obtained in step S5 isWherein/>For SIAMESEMAE networks, the SIAMESEMAE network adopts a network based on a transducer architecture, two images are firstly converted into a series of non-overlapped image blocks, then the linear projection mapping is used for embedding the image blocks, position embedding information is added, a [ cls ] mark is added, and the network is encouraged to learn the correlation of the two images by establishing asymmetric masking, namely, unshielding/>And mask/>90% Of the image blocks in the image are then encoded using a weight-shared transducer encoder, the processed image blocks will/>The encoded image is embedded as key and value,/>The coded image is embedded as a query and fed into a transducer decoder to obtain a panoramic mosaic image/>。

8. A fisheye image correction stitching system capable of performing the method of any one of claims 1-7, comprising:

9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of any one of claims 1-7.

10. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of any of claims 1-7.